All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/T/D][PATCH 0/2] KVM page cache optimization (v2)
@ 2010-06-08 15:51 ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-08 15:51 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, linux-mm, Balbir Singh, linux-kernel

This is version 2 of the page cache control patches for
KVM. This series has two patches, the first controls
the amount of unmapped page cache usage via a boot
parameter and sysctl. The second patch controls page
and slab cache via the balloon driver. Both the patches
make heavy use of the zone_reclaim() functionality
already present in the kernel.

page-cache-control
balloon-page-cache

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC/T/D][PATCH 0/2] KVM page cache optimization (v2)
@ 2010-06-08 15:51 ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-08 15:51 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, linux-mm, Balbir Singh, linux-kernel

This is version 2 of the page cache control patches for
KVM. This series has two patches, the first controls
the amount of unmapped page cache usage via a boot
parameter and sysctl. The second patch controls page
and slab cache via the balloon driver. Both the patches
make heavy use of the zone_reclaim() functionality
already present in the kernel.

page-cache-control
balloon-page-cache

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
  2010-06-08 15:51 ` Balbir Singh
@ 2010-06-08 15:51   ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-08 15:51 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, linux-mm, Balbir Singh, linux-kernel

Selectively control Unmapped Page Cache (nospam version)

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario

- In a virtualized environment with cache=writethrough, we see
  double caching - (one in the host and one in the guest). As
  we try to scale guests, cache usage across the system grows.
  The goal of this patch is to reclaim page cache when Linux is running
  as a guest and get the host to hold the page cache and manage it.
  There might be temporary duplication, but in the long run, memory
  in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
  can selectively turn it on, on a need to use basis.

A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.

KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.

There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
and the number of pages to reclaim when unmapped_page_control argument
is supplied. These numbers were chosen to avoid aggressiveness in
reaping page cache ever so frequently, at the same time providing control.

The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim.

The patch is applied against mmotm feb-11-2010.

TODt Usage without boot parameter (memory in KB)
----------------------------
MemFree Cached Time
19900   292912 137
17540   296196 139
17900   296124 141
19356   296660 141

Host usage:  (memory in KB)

RSS     Cache   mapped  swap
2788664 781884  3780    359536

Guest Usage with boot parameter (memory in KB)
-------------------------
Memfree Cached   Time
244824  74828   144
237840  81764   143
235880  83044   138
239312  80092   148

Host usage: (memory in KB)

RSS     Cache   mapped  swap
2700184 958012  334848  398412

TODOS
-----
1. Balance slab cache as well
2. Invoke the balance routines from the balloon driver

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/mmzone.h |    2 -
 include/linux/swap.h   |    3 +
 mm/page_alloc.c        |    9 ++-
 mm/vmscan.c            |  165 ++++++++++++++++++++++++++++++++++++------------
 4 files changed, 134 insertions(+), 45 deletions(-)


diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b4d109e..9f96b6d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -293,12 +293,12 @@ struct zone {
 	 */
 	unsigned long		lowmem_reserve[MAX_NR_ZONES];
 
+	unsigned long		min_unmapped_pages;
 #ifdef CONFIG_NUMA
 	int node;
 	/*
 	 * zone reclaim becomes active if more unmapped pages exist.
 	 */
-	unsigned long		min_unmapped_pages;
 	unsigned long		min_slab_pages;
 #endif
 	struct per_cpu_pageset __percpu *pageset;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ff4acea..f92f1ee 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -251,10 +251,11 @@ extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
+extern bool should_balance_unmapped_pages(struct zone *zone);
 
+extern int sysctl_min_unmapped_ratio;
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
-extern int sysctl_min_unmapped_ratio;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 431214b..fee9420 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1641,6 +1641,9 @@ zonelist_scan:
 			unsigned long mark;
 			int ret;
 
+			if (should_balance_unmapped_pages(zone))
+				wakeup_kswapd(zone, order);
+
 			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 			if (zone_watermark_ok(zone, order, mark,
 				    classzone_idx, alloc_flags))
@@ -4069,10 +4072,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
 						/ 100;
+#ifdef CONFIG_NUMA
+		zone->node = nid;
 		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
 #endif
 		zone->name = zone_names[j];
@@ -4982,7 +4985,6 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
-#ifdef CONFIG_NUMA
 int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -4999,6 +5001,7 @@ int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
 int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c7e57c..27bc536 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -136,6 +136,18 @@ static DECLARE_RWSEM(shrinker_rwsem);
 #define scanning_global_lru(sc)	(1)
 #endif
 
+static int unmapped_page_control __read_mostly;
+
+static int __init unmapped_page_control_parm(char *str)
+{
+	unmapped_page_control = 1;
+	/*
+	 * XXX: Should we tweak swappiness here?
+	 */
+	return 1;
+}
+__setup("unmapped_page_control", unmapped_page_control_parm);
+
 static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
 						  struct scan_control *sc)
 {
@@ -1986,6 +1998,103 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
 }
 
 /*
+ * Percentage of pages in a zone that must be unmapped for zone_reclaim to
+ * occur.
+ */
+int sysctl_min_unmapped_ratio = 1;
+/*
+ * Priority for ZONE_RECLAIM. This determines the fraction of pages
+ * of a node considered for each zone_reclaim. 4 scans 1/16th of
+ * a zone.
+ */
+#define ZONE_RECLAIM_PRIORITY 4
+
+
+#define RECLAIM_OFF 0
+#define RECLAIM_ZONE (1<<0)	/* Run shrink_inactive_list on the zone */
+#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
+#define RECLAIM_SWAP (1<<2)	/* Swap pages out during reclaim */
+
+static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
+{
+	unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
+	unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
+		zone_page_state(zone, NR_ACTIVE_FILE);
+
+	/*
+	 * It's possible for there to be more file mapped pages than
+	 * accounted for by the pages on the file LRU lists because
+	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
+	 */
+	return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
+}
+
+/*
+ * Helper function to reclaim unmapped pages, we might add something
+ * similar to this for slab cache as well. Currently this function
+ * is shared with __zone_reclaim()
+ */
+static inline void
+zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
+				unsigned long nr_pages)
+{
+	int priority;
+	/*
+	 * Free memory by calling shrink zone with increasing
+	 * priorities until we have enough memory freed.
+	 */
+	priority = ZONE_RECLAIM_PRIORITY;
+	do {
+		note_zone_scanning_priority(zone, priority);
+		shrink_zone(priority, zone, sc);
+		priority--;
+	} while (priority >= 0 && sc->nr_reclaimed < nr_pages);
+}
+
+/*
+ * Routine to balance unmapped pages, inspired from the code under
+ * CONFIG_NUMA that does unmapped page and slab page control by keeping
+ * min_unmapped_pages in the zone. We currently reclaim just unmapped
+ * pages, slab control will come in soon, at which point this routine
+ * should be called balance cached pages
+ */
+static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
+						struct scan_control *sc)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
+		struct scan_control nsc;
+		unsigned long nr_pages;
+
+		nsc = *sc;
+
+		nsc.swappiness = 0;
+		nsc.may_writepage = 0;
+		nsc.may_unmap = 0;
+		nsc.nr_reclaimed = 0;
+
+		nr_pages = zone_unmapped_file_pages(zone) -
+				zone->min_unmapped_pages;
+		/* Magically try to reclaim eighth the unmapped cache pages */
+		nr_pages >>= 3;
+
+		zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
+		return nsc.nr_reclaimed;
+	}
+	return 0;
+}
+
+#define UNMAPPED_PAGE_RATIO 16
+bool should_balance_unmapped_pages(struct zone *zone)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) >
+			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
+		return true;
+	return false;
+}
+
+/*
  * For kswapd, balance_pgdat() will work across all this node's zones until
  * they are all at high_wmark_pages(zone).
  *
@@ -2074,6 +2183,12 @@ loop_again:
 				shrink_active_list(SWAP_CLUSTER_MAX, zone,
 							&sc, priority, 0);
 
+			/*
+			 * We do unmapped page balancing once here and once
+			 * below, so that we don't lose out
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
+
 			if (!zone_watermark_ok(zone, order,
 					high_wmark_pages(zone), 0, 0)) {
 				end_zone = i;
@@ -2115,6 +2230,13 @@ loop_again:
 
 			nid = pgdat->node_id;
 			zid = zone_idx(zone);
+
+			/*
+			 * Balance unmapped pages upfront, this should be
+			 * really cheap
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
+
 			/*
 			 * Call soft limit reclaim before calling shrink_zone.
 			 * For now we ignore the return value
@@ -2336,7 +2458,8 @@ void wakeup_kswapd(struct zone *zone, int order)
 		return;
 
 	pgdat = zone->zone_pgdat;
-	if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
+	if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0) &&
+		!should_balance_unmapped_pages(zone))
 		return;
 	if (pgdat->kswapd_max_order < order)
 		pgdat->kswapd_max_order = order;
@@ -2502,44 +2625,12 @@ module_init(kswapd_init)
  */
 int zone_reclaim_mode __read_mostly;
 
-#define RECLAIM_OFF 0
-#define RECLAIM_ZONE (1<<0)	/* Run shrink_inactive_list on the zone */
-#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
-#define RECLAIM_SWAP (1<<2)	/* Swap pages out during reclaim */
-
-/*
- * Priority for ZONE_RECLAIM. This determines the fraction of pages
- * of a node considered for each zone_reclaim. 4 scans 1/16th of
- * a zone.
- */
-#define ZONE_RECLAIM_PRIORITY 4
-
-/*
- * Percentage of pages in a zone that must be unmapped for zone_reclaim to
- * occur.
- */
-int sysctl_min_unmapped_ratio = 1;
-
 /*
  * If the number of slab pages in a zone grows beyond this percentage then
  * slab reclaim needs to occur.
  */
 int sysctl_min_slab_ratio = 5;
 
-static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
-{
-	unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
-	unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
-		zone_page_state(zone, NR_ACTIVE_FILE);
-
-	/*
-	 * It's possible for there to be more file mapped pages than
-	 * accounted for by the pages on the file LRU lists because
-	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
-	 */
-	return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
-}
-
 /* Work out how many page cache pages we can reclaim in this reclaim_mode */
 static long zone_pagecache_reclaimable(struct zone *zone)
 {
@@ -2577,7 +2668,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	const unsigned long nr_pages = 1 << order;
 	struct task_struct *p = current;
 	struct reclaim_state reclaim_state;
-	int priority;
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2607,12 +2697,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * Free memory by calling shrink zone with increasing
 		 * priorities until we have enough memory freed.
 		 */
-		priority = ZONE_RECLAIM_PRIORITY;
-		do {
-			note_zone_scanning_priority(zone, priority);
-			shrink_zone(priority, zone, &sc);
-			priority--;
-		} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
+		zone_reclaim_unmapped_pages(zone, &sc, nr_pages);
 	}
 
 	slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);

-- 
	Three Cheers,
	Balbir

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
@ 2010-06-08 15:51   ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-08 15:51 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, linux-mm, Balbir Singh, linux-kernel

Selectively control Unmapped Page Cache (nospam version)

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario

- In a virtualized environment with cache=writethrough, we see
  double caching - (one in the host and one in the guest). As
  we try to scale guests, cache usage across the system grows.
  The goal of this patch is to reclaim page cache when Linux is running
  as a guest and get the host to hold the page cache and manage it.
  There might be temporary duplication, but in the long run, memory
  in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
  can selectively turn it on, on a need to use basis.

A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.

KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.

There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
and the number of pages to reclaim when unmapped_page_control argument
is supplied. These numbers were chosen to avoid aggressiveness in
reaping page cache ever so frequently, at the same time providing control.

The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim.

The patch is applied against mmotm feb-11-2010.

TODt Usage without boot parameter (memory in KB)
----------------------------
MemFree Cached Time
19900   292912 137
17540   296196 139
17900   296124 141
19356   296660 141

Host usage:  (memory in KB)

RSS     Cache   mapped  swap
2788664 781884  3780    359536

Guest Usage with boot parameter (memory in KB)
-------------------------
Memfree Cached   Time
244824  74828   144
237840  81764   143
235880  83044   138
239312  80092   148

Host usage: (memory in KB)

RSS     Cache   mapped  swap
2700184 958012  334848  398412

TODOS
-----
1. Balance slab cache as well
2. Invoke the balance routines from the balloon driver

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/mmzone.h |    2 -
 include/linux/swap.h   |    3 +
 mm/page_alloc.c        |    9 ++-
 mm/vmscan.c            |  165 ++++++++++++++++++++++++++++++++++++------------
 4 files changed, 134 insertions(+), 45 deletions(-)


diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b4d109e..9f96b6d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -293,12 +293,12 @@ struct zone {
 	 */
 	unsigned long		lowmem_reserve[MAX_NR_ZONES];
 
+	unsigned long		min_unmapped_pages;
 #ifdef CONFIG_NUMA
 	int node;
 	/*
 	 * zone reclaim becomes active if more unmapped pages exist.
 	 */
-	unsigned long		min_unmapped_pages;
 	unsigned long		min_slab_pages;
 #endif
 	struct per_cpu_pageset __percpu *pageset;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ff4acea..f92f1ee 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -251,10 +251,11 @@ extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
+extern bool should_balance_unmapped_pages(struct zone *zone);
 
+extern int sysctl_min_unmapped_ratio;
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
-extern int sysctl_min_unmapped_ratio;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 431214b..fee9420 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1641,6 +1641,9 @@ zonelist_scan:
 			unsigned long mark;
 			int ret;
 
+			if (should_balance_unmapped_pages(zone))
+				wakeup_kswapd(zone, order);
+
 			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 			if (zone_watermark_ok(zone, order, mark,
 				    classzone_idx, alloc_flags))
@@ -4069,10 +4072,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
 						/ 100;
+#ifdef CONFIG_NUMA
+		zone->node = nid;
 		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
 #endif
 		zone->name = zone_names[j];
@@ -4982,7 +4985,6 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
-#ifdef CONFIG_NUMA
 int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -4999,6 +5001,7 @@ int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
 int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c7e57c..27bc536 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -136,6 +136,18 @@ static DECLARE_RWSEM(shrinker_rwsem);
 #define scanning_global_lru(sc)	(1)
 #endif
 
+static int unmapped_page_control __read_mostly;
+
+static int __init unmapped_page_control_parm(char *str)
+{
+	unmapped_page_control = 1;
+	/*
+	 * XXX: Should we tweak swappiness here?
+	 */
+	return 1;
+}
+__setup("unmapped_page_control", unmapped_page_control_parm);
+
 static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
 						  struct scan_control *sc)
 {
@@ -1986,6 +1998,103 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
 }
 
 /*
+ * Percentage of pages in a zone that must be unmapped for zone_reclaim to
+ * occur.
+ */
+int sysctl_min_unmapped_ratio = 1;
+/*
+ * Priority for ZONE_RECLAIM. This determines the fraction of pages
+ * of a node considered for each zone_reclaim. 4 scans 1/16th of
+ * a zone.
+ */
+#define ZONE_RECLAIM_PRIORITY 4
+
+
+#define RECLAIM_OFF 0
+#define RECLAIM_ZONE (1<<0)	/* Run shrink_inactive_list on the zone */
+#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
+#define RECLAIM_SWAP (1<<2)	/* Swap pages out during reclaim */
+
+static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
+{
+	unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
+	unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
+		zone_page_state(zone, NR_ACTIVE_FILE);
+
+	/*
+	 * It's possible for there to be more file mapped pages than
+	 * accounted for by the pages on the file LRU lists because
+	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
+	 */
+	return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
+}
+
+/*
+ * Helper function to reclaim unmapped pages, we might add something
+ * similar to this for slab cache as well. Currently this function
+ * is shared with __zone_reclaim()
+ */
+static inline void
+zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
+				unsigned long nr_pages)
+{
+	int priority;
+	/*
+	 * Free memory by calling shrink zone with increasing
+	 * priorities until we have enough memory freed.
+	 */
+	priority = ZONE_RECLAIM_PRIORITY;
+	do {
+		note_zone_scanning_priority(zone, priority);
+		shrink_zone(priority, zone, sc);
+		priority--;
+	} while (priority >= 0 && sc->nr_reclaimed < nr_pages);
+}
+
+/*
+ * Routine to balance unmapped pages, inspired from the code under
+ * CONFIG_NUMA that does unmapped page and slab page control by keeping
+ * min_unmapped_pages in the zone. We currently reclaim just unmapped
+ * pages, slab control will come in soon, at which point this routine
+ * should be called balance cached pages
+ */
+static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
+						struct scan_control *sc)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
+		struct scan_control nsc;
+		unsigned long nr_pages;
+
+		nsc = *sc;
+
+		nsc.swappiness = 0;
+		nsc.may_writepage = 0;
+		nsc.may_unmap = 0;
+		nsc.nr_reclaimed = 0;
+
+		nr_pages = zone_unmapped_file_pages(zone) -
+				zone->min_unmapped_pages;
+		/* Magically try to reclaim eighth the unmapped cache pages */
+		nr_pages >>= 3;
+
+		zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
+		return nsc.nr_reclaimed;
+	}
+	return 0;
+}
+
+#define UNMAPPED_PAGE_RATIO 16
+bool should_balance_unmapped_pages(struct zone *zone)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) >
+			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
+		return true;
+	return false;
+}
+
+/*
  * For kswapd, balance_pgdat() will work across all this node's zones until
  * they are all at high_wmark_pages(zone).
  *
@@ -2074,6 +2183,12 @@ loop_again:
 				shrink_active_list(SWAP_CLUSTER_MAX, zone,
 							&sc, priority, 0);
 
+			/*
+			 * We do unmapped page balancing once here and once
+			 * below, so that we don't lose out
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
+
 			if (!zone_watermark_ok(zone, order,
 					high_wmark_pages(zone), 0, 0)) {
 				end_zone = i;
@@ -2115,6 +2230,13 @@ loop_again:
 
 			nid = pgdat->node_id;
 			zid = zone_idx(zone);
+
+			/*
+			 * Balance unmapped pages upfront, this should be
+			 * really cheap
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
+
 			/*
 			 * Call soft limit reclaim before calling shrink_zone.
 			 * For now we ignore the return value
@@ -2336,7 +2458,8 @@ void wakeup_kswapd(struct zone *zone, int order)
 		return;
 
 	pgdat = zone->zone_pgdat;
-	if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
+	if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0) &&
+		!should_balance_unmapped_pages(zone))
 		return;
 	if (pgdat->kswapd_max_order < order)
 		pgdat->kswapd_max_order = order;
@@ -2502,44 +2625,12 @@ module_init(kswapd_init)
  */
 int zone_reclaim_mode __read_mostly;
 
-#define RECLAIM_OFF 0
-#define RECLAIM_ZONE (1<<0)	/* Run shrink_inactive_list on the zone */
-#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
-#define RECLAIM_SWAP (1<<2)	/* Swap pages out during reclaim */
-
-/*
- * Priority for ZONE_RECLAIM. This determines the fraction of pages
- * of a node considered for each zone_reclaim. 4 scans 1/16th of
- * a zone.
- */
-#define ZONE_RECLAIM_PRIORITY 4
-
-/*
- * Percentage of pages in a zone that must be unmapped for zone_reclaim to
- * occur.
- */
-int sysctl_min_unmapped_ratio = 1;
-
 /*
  * If the number of slab pages in a zone grows beyond this percentage then
  * slab reclaim needs to occur.
  */
 int sysctl_min_slab_ratio = 5;
 
-static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
-{
-	unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
-	unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
-		zone_page_state(zone, NR_ACTIVE_FILE);
-
-	/*
-	 * It's possible for there to be more file mapped pages than
-	 * accounted for by the pages on the file LRU lists because
-	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
-	 */
-	return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
-}
-
 /* Work out how many page cache pages we can reclaim in this reclaim_mode */
 static long zone_pagecache_reclaimable(struct zone *zone)
 {
@@ -2577,7 +2668,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	const unsigned long nr_pages = 1 << order;
 	struct task_struct *p = current;
 	struct reclaim_state reclaim_state;
-	int priority;
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2607,12 +2697,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * Free memory by calling shrink zone with increasing
 		 * priorities until we have enough memory freed.
 		 */
-		priority = ZONE_RECLAIM_PRIORITY;
-		do {
-			note_zone_scanning_priority(zone, priority);
-			shrink_zone(priority, zone, &sc);
-			priority--;
-		} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
+		zone_reclaim_unmapped_pages(zone, &sc, nr_pages);
 	}
 
 	slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-08 15:51 ` Balbir Singh
@ 2010-06-08 15:51   ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-08 15:51 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, linux-mm, Balbir Singh, linux-kernel

Balloon unmapped page cache pages first

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch builds on the ballooning infrastructure by ballooning unmapped
page cache pages first. It looks for low hanging fruit first and tries
to reclaim clean unmapped pages first.

This patch brings zone_reclaim() and other dependencies out of CONFIG_NUMA
and then reuses the zone_reclaim_mode logic if __GFP_FREE_CACHE is passed
in the gfp_mask. The virtio balloon driver has been changed to use
__GFP_FREE_CACHE.

Tests:

I ran a simple filter function that kept frequently ballon a single VM
running kernbench. The VM was configured with 2GB of memory and 2 VCPUs.
The filter function was a triangular wave function that ballooned
the VM under study from 500MB to 1500MB using a triangular wave function
continously. The run times of the VM with and without changes are shown
below. The run times showed no significant impact of the changes.

Withchanges

Elapsed Time 223.86 (1.52822)
User Time 191.01 (0.65395)
System Time 199.468 (2.43616)
Percent CPU 174 (1)
Context Switches 103182 (595.05)
Sleeps 39107.6 (1505.67)

Without changes

Elapsed Time 225.526 (2.93102)
User Time 193.53 (3.53626)
System Time 199.832 (3.26281)
Percent CPU 173.6 (1.14018)
Context Switches 103744 (1311.53)
Sleeps 39383.2 (831.865)

The key advantage was that it resulted in lesser RSS usage in the host and
more cached usage, indicating that the caching had been pushed towards
the host. The guest cached memory usage was lower and free memory in
the guest was also higher.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 drivers/virtio/virtio_balloon.c |    3 ++-
 include/linux/gfp.h             |    8 +++++++-
 include/linux/swap.h            |    9 +++------
 mm/page_alloc.c                 |    3 ++-
 mm/vmscan.c                     |    2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)


diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0f1da45..609a9c2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -104,7 +104,8 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
 
 	for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) {
 		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-					__GFP_NOMEMALLOC | __GFP_NOWARN);
+					__GFP_NOMEMALLOC | __GFP_NOWARN |
+					__GFP_FREE_CACHE);
 		if (!page) {
 			if (printk_ratelimit())
 				dev_printk(KERN_INFO, &vb->vdev->dev,
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 975609c..9048259 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -61,12 +61,18 @@ struct vm_area_struct;
 #endif
 
 /*
+ * While allocating pages, try to free cache pages first. Note the
+ * heavy dependency on zone_reclaim_mode logic
+ */
+#define __GFP_FREE_CACHE ((__force gfp_t)0x400000u) /* Free cache first */
+
+/*
  * This may seem redundant, but it's a way of annotating false positives vs.
  * allocations that simply cannot be supported (e.g. page tables).
  */
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 
-#define __GFP_BITS_SHIFT 22	/* Room for 22 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 23	/* Room for 22 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index f92f1ee..f77c603 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -254,16 +254,13 @@ extern long vm_total_pages;
 extern bool should_balance_unmapped_pages(struct zone *zone);
 
 extern int sysctl_min_unmapped_ratio;
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
 #else
 #define zone_reclaim_mode 0
-static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
-{
-	return 0;
-}
 #endif
 
 extern int page_evictable(struct page *page, struct vm_area_struct *vma);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fee9420..d977b36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1649,7 +1649,8 @@ zonelist_scan:
 				    classzone_idx, alloc_flags))
 				goto try_this_zone;
 
-			if (zone_reclaim_mode == 0)
+			if (zone_reclaim_mode == 0 &&
+				!(gfp_mask & __GFP_FREE_CACHE))
 				goto this_zone_full;
 
 			ret = zone_reclaim(zone, gfp_mask, order);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 27bc536..393bee5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2624,6 +2624,7 @@ module_init(kswapd_init)
  * the watermarks.
  */
 int zone_reclaim_mode __read_mostly;
+#endif
 
 /*
  * If the number of slab pages in a zone grows beyond this percentage then
@@ -2780,7 +2781,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
-#endif
 
 /*
  * page_evictable - test whether a page is evictable

-- 
	Three Cheers,
	Balbir

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-08 15:51   ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-08 15:51 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, linux-mm, Balbir Singh, linux-kernel

Balloon unmapped page cache pages first

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch builds on the ballooning infrastructure by ballooning unmapped
page cache pages first. It looks for low hanging fruit first and tries
to reclaim clean unmapped pages first.

This patch brings zone_reclaim() and other dependencies out of CONFIG_NUMA
and then reuses the zone_reclaim_mode logic if __GFP_FREE_CACHE is passed
in the gfp_mask. The virtio balloon driver has been changed to use
__GFP_FREE_CACHE.

Tests:

I ran a simple filter function that kept frequently ballon a single VM
running kernbench. The VM was configured with 2GB of memory and 2 VCPUs.
The filter function was a triangular wave function that ballooned
the VM under study from 500MB to 1500MB using a triangular wave function
continously. The run times of the VM with and without changes are shown
below. The run times showed no significant impact of the changes.

Withchanges

Elapsed Time 223.86 (1.52822)
User Time 191.01 (0.65395)
System Time 199.468 (2.43616)
Percent CPU 174 (1)
Context Switches 103182 (595.05)
Sleeps 39107.6 (1505.67)

Without changes

Elapsed Time 225.526 (2.93102)
User Time 193.53 (3.53626)
System Time 199.832 (3.26281)
Percent CPU 173.6 (1.14018)
Context Switches 103744 (1311.53)
Sleeps 39383.2 (831.865)

The key advantage was that it resulted in lesser RSS usage in the host and
more cached usage, indicating that the caching had been pushed towards
the host. The guest cached memory usage was lower and free memory in
the guest was also higher.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 drivers/virtio/virtio_balloon.c |    3 ++-
 include/linux/gfp.h             |    8 +++++++-
 include/linux/swap.h            |    9 +++------
 mm/page_alloc.c                 |    3 ++-
 mm/vmscan.c                     |    2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)


diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0f1da45..609a9c2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -104,7 +104,8 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
 
 	for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) {
 		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-					__GFP_NOMEMALLOC | __GFP_NOWARN);
+					__GFP_NOMEMALLOC | __GFP_NOWARN |
+					__GFP_FREE_CACHE);
 		if (!page) {
 			if (printk_ratelimit())
 				dev_printk(KERN_INFO, &vb->vdev->dev,
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 975609c..9048259 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -61,12 +61,18 @@ struct vm_area_struct;
 #endif
 
 /*
+ * While allocating pages, try to free cache pages first. Note the
+ * heavy dependency on zone_reclaim_mode logic
+ */
+#define __GFP_FREE_CACHE ((__force gfp_t)0x400000u) /* Free cache first */
+
+/*
  * This may seem redundant, but it's a way of annotating false positives vs.
  * allocations that simply cannot be supported (e.g. page tables).
  */
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 
-#define __GFP_BITS_SHIFT 22	/* Room for 22 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 23	/* Room for 22 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index f92f1ee..f77c603 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -254,16 +254,13 @@ extern long vm_total_pages;
 extern bool should_balance_unmapped_pages(struct zone *zone);
 
 extern int sysctl_min_unmapped_ratio;
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
 #else
 #define zone_reclaim_mode 0
-static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
-{
-	return 0;
-}
 #endif
 
 extern int page_evictable(struct page *page, struct vm_area_struct *vma);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fee9420..d977b36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1649,7 +1649,8 @@ zonelist_scan:
 				    classzone_idx, alloc_flags))
 				goto try_this_zone;
 
-			if (zone_reclaim_mode == 0)
+			if (zone_reclaim_mode == 0 &&
+				!(gfp_mask & __GFP_FREE_CACHE))
 				goto this_zone_full;
 
 			ret = zone_reclaim(zone, gfp_mask, order);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 27bc536..393bee5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2624,6 +2624,7 @@ module_init(kswapd_init)
  * the watermarks.
  */
 int zone_reclaim_mode __read_mostly;
+#endif
 
 /*
  * If the number of slab pages in a zone grows beyond this percentage then
@@ -2780,7 +2781,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
-#endif
 
 /*
  * page_evictable - test whether a page is evictable

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-08 15:51   ` Balbir Singh
@ 2010-06-10  9:43     ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-10  9:43 UTC (permalink / raw)
  To: Balbir Singh; +Cc: kvm, linux-mm, linux-kernel

On 06/08/2010 06:51 PM, Balbir Singh wrote:
> Balloon unmapped page cache pages first
>
> From: Balbir Singh<balbir@linux.vnet.ibm.com>
>
> This patch builds on the ballooning infrastructure by ballooning unmapped
> page cache pages first. It looks for low hanging fruit first and tries
> to reclaim clean unmapped pages first.
>    

I'm not sure victimizing unmapped cache pages is a good idea.  Shouldn't 
page selection use the LRU for recency information instead of the cost 
of guest reclaim?  Dropping a frequently used unmapped cache page can be 
more expensive than dropping an unused text page that was loaded as part 
of some executable's initialization and forgotten.

Many workloads have many unmapped cache pages, for example static web 
serving and the all-important kernel build.

> The key advantage was that it resulted in lesser RSS usage in the host and
> more cached usage, indicating that the caching had been pushed towards
> the host. The guest cached memory usage was lower and free memory in
> the guest was also higher.
>    

Caching in the host is only helpful if the cache can be shared, 
otherwise it's better to cache in the guest.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-10  9:43     ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-10  9:43 UTC (permalink / raw)
  To: Balbir Singh; +Cc: kvm, linux-mm, linux-kernel

On 06/08/2010 06:51 PM, Balbir Singh wrote:
> Balloon unmapped page cache pages first
>
> From: Balbir Singh<balbir@linux.vnet.ibm.com>
>
> This patch builds on the ballooning infrastructure by ballooning unmapped
> page cache pages first. It looks for low hanging fruit first and tries
> to reclaim clean unmapped pages first.
>    

I'm not sure victimizing unmapped cache pages is a good idea.  Shouldn't 
page selection use the LRU for recency information instead of the cost 
of guest reclaim?  Dropping a frequently used unmapped cache page can be 
more expensive than dropping an unused text page that was loaded as part 
of some executable's initialization and forgotten.

Many workloads have many unmapped cache pages, for example static web 
serving and the all-important kernel build.

> The key advantage was that it resulted in lesser RSS usage in the host and
> more cached usage, indicating that the caching had been pushed towards
> the host. The guest cached memory usage was lower and free memory in
> the guest was also higher.
>    

Caching in the host is only helpful if the cache can be shared, 
otherwise it's better to cache in the guest.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-10  9:43     ` Avi Kivity
@ 2010-06-10 14:25       ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-10 14:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-10 12:43:11]:

> On 06/08/2010 06:51 PM, Balbir Singh wrote:
> >Balloon unmapped page cache pages first
> >
> >From: Balbir Singh<balbir@linux.vnet.ibm.com>
> >
> >This patch builds on the ballooning infrastructure by ballooning unmapped
> >page cache pages first. It looks for low hanging fruit first and tries
> >to reclaim clean unmapped pages first.
> 
> I'm not sure victimizing unmapped cache pages is a good idea.
> Shouldn't page selection use the LRU for recency information instead
> of the cost of guest reclaim?  Dropping a frequently used unmapped
> cache page can be more expensive than dropping an unused text page
> that was loaded as part of some executable's initialization and
> forgotten.
>

We victimize the unmapped cache only if it is unused (in LRU order).
We don't force the issue too much. We also have free slab cache to go
after.

> Many workloads have many unmapped cache pages, for example static
> web serving and the all-important kernel build.
> 

I've tested kernbench, you can see the results in the original posting
and there is no observable overhead as a result of the patch in my
run.

> >The key advantage was that it resulted in lesser RSS usage in the host and
> >more cached usage, indicating that the caching had been pushed towards
> >the host. The guest cached memory usage was lower and free memory in
> >the guest was also higher.
> 
> Caching in the host is only helpful if the cache can be shared,
> otherwise it's better to cache in the guest.
>

Hmm.. so we would need a ballon cache hint from the monitor, so that
it is not unconditional? Overall my results show the following

1. No drastic reduction of guest unmapped cache, just sufficient to
show lesser RSS in the host. More freeable memory (as in cached
memory + free memory) visible on the host.
2. No significant impact on the benchmark (numbers) running in the
guest.

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-10 14:25       ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-10 14:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-10 12:43:11]:

> On 06/08/2010 06:51 PM, Balbir Singh wrote:
> >Balloon unmapped page cache pages first
> >
> >From: Balbir Singh<balbir@linux.vnet.ibm.com>
> >
> >This patch builds on the ballooning infrastructure by ballooning unmapped
> >page cache pages first. It looks for low hanging fruit first and tries
> >to reclaim clean unmapped pages first.
> 
> I'm not sure victimizing unmapped cache pages is a good idea.
> Shouldn't page selection use the LRU for recency information instead
> of the cost of guest reclaim?  Dropping a frequently used unmapped
> cache page can be more expensive than dropping an unused text page
> that was loaded as part of some executable's initialization and
> forgotten.
>

We victimize the unmapped cache only if it is unused (in LRU order).
We don't force the issue too much. We also have free slab cache to go
after.

> Many workloads have many unmapped cache pages, for example static
> web serving and the all-important kernel build.
> 

I've tested kernbench, you can see the results in the original posting
and there is no observable overhead as a result of the patch in my
run.

> >The key advantage was that it resulted in lesser RSS usage in the host and
> >more cached usage, indicating that the caching had been pushed towards
> >the host. The guest cached memory usage was lower and free memory in
> >the guest was also higher.
> 
> Caching in the host is only helpful if the cache can be shared,
> otherwise it's better to cache in the guest.
>

Hmm.. so we would need a ballon cache hint from the monitor, so that
it is not unconditional? Overall my results show the following

1. No drastic reduction of guest unmapped cache, just sufficient to
show lesser RSS in the host. More freeable memory (as in cached
memory + free memory) visible on the host.
2. No significant impact on the benchmark (numbers) running in the
guest.

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-10 14:25       ` Balbir Singh
@ 2010-06-11  0:07         ` Dave Hansen
  -1 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-11  0:07 UTC (permalink / raw)
  To: balbir; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > I'm not sure victimizing unmapped cache pages is a good idea.
> > Shouldn't page selection use the LRU for recency information instead
> > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > cache page can be more expensive than dropping an unused text page
> > that was loaded as part of some executable's initialization and
> > forgotten.
> 
> We victimize the unmapped cache only if it is unused (in LRU order).
> We don't force the issue too much. We also have free slab cache to go
> after.

Just to be clear, let's say we have a mapped page (say of /sbin/init)
that's been unreferenced since _just_ after the system booted.  We also
have an unmapped page cache page of a file often used at runtime, say
one from /etc/resolv.conf or /etc/passwd.

Which page will be preferred for eviction with this patch set?

-- Dave


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-11  0:07         ` Dave Hansen
  0 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-11  0:07 UTC (permalink / raw)
  To: balbir; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > I'm not sure victimizing unmapped cache pages is a good idea.
> > Shouldn't page selection use the LRU for recency information instead
> > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > cache page can be more expensive than dropping an unused text page
> > that was loaded as part of some executable's initialization and
> > forgotten.
> 
> We victimize the unmapped cache only if it is unused (in LRU order).
> We don't force the issue too much. We also have free slab cache to go
> after.

Just to be clear, let's say we have a mapped page (say of /sbin/init)
that's been unreferenced since _just_ after the system booted.  We also
have an unmapped page cache page of a file often used at runtime, say
one from /etc/resolv.conf or /etc/passwd.

Which page will be preferred for eviction with this patch set?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-11  0:07         ` Dave Hansen
@ 2010-06-11  1:54           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-11  1:54 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, Avi Kivity, kvm, linux-mm, linux-kernel

On Thu, 10 Jun 2010 17:07:32 -0700
Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > Shouldn't page selection use the LRU for recency information instead
> > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > cache page can be more expensive than dropping an unused text page
> > > that was loaded as part of some executable's initialization and
> > > forgotten.
> > 
> > We victimize the unmapped cache only if it is unused (in LRU order).
> > We don't force the issue too much. We also have free slab cache to go
> > after.
> 
> Just to be clear, let's say we have a mapped page (say of /sbin/init)
> that's been unreferenced since _just_ after the system booted.  We also
> have an unmapped page cache page of a file often used at runtime, say
> one from /etc/resolv.conf or /etc/passwd.
> 

Hmm. I'm not fan of estimating working set size by calculation
based on some numbers without considering history or feedback.

Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
or GA (or somehing more smart) to detect the size ?

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-11  1:54           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-11  1:54 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, Avi Kivity, kvm, linux-mm, linux-kernel

On Thu, 10 Jun 2010 17:07:32 -0700
Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > Shouldn't page selection use the LRU for recency information instead
> > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > cache page can be more expensive than dropping an unused text page
> > > that was loaded as part of some executable's initialization and
> > > forgotten.
> > 
> > We victimize the unmapped cache only if it is unused (in LRU order).
> > We don't force the issue too much. We also have free slab cache to go
> > after.
> 
> Just to be clear, let's say we have a mapped page (say of /sbin/init)
> that's been unreferenced since _just_ after the system booted.  We also
> have an unmapped page cache page of a file often used at runtime, say
> one from /etc/resolv.conf or /etc/passwd.
> 

Hmm. I'm not fan of estimating working set size by calculation
based on some numbers without considering history or feedback.

Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
or GA (or somehing more smart) to detect the size ?

Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-11  1:54           ` KAMEZAWA Hiroyuki
@ 2010-06-11  4:46             ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-11  4:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, Avi Kivity, kvm, linux-mm, linux-kernel

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-11 10:54:41]:

> On Thu, 10 Jun 2010 17:07:32 -0700
> Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> 
> > On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > > Shouldn't page selection use the LRU for recency information instead
> > > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > > cache page can be more expensive than dropping an unused text page
> > > > that was loaded as part of some executable's initialization and
> > > > forgotten.
> > > 
> > > We victimize the unmapped cache only if it is unused (in LRU order).
> > > We don't force the issue too much. We also have free slab cache to go
> > > after.
> > 
> > Just to be clear, let's say we have a mapped page (say of /sbin/init)
> > that's been unreferenced since _just_ after the system booted.  We also
> > have an unmapped page cache page of a file often used at runtime, say
> > one from /etc/resolv.conf or /etc/passwd.
> > 
> 
> Hmm. I'm not fan of estimating working set size by calculation
> based on some numbers without considering history or feedback.
> 
> Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
> or GA (or somehing more smart) to detect the size ?
>

Could you please clarify at what level you are suggesting size
detection? I assume it is outside the OS, right? 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-11  4:46             ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-11  4:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, Avi Kivity, kvm, linux-mm, linux-kernel

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-11 10:54:41]:

> On Thu, 10 Jun 2010 17:07:32 -0700
> Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> 
> > On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > > Shouldn't page selection use the LRU for recency information instead
> > > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > > cache page can be more expensive than dropping an unused text page
> > > > that was loaded as part of some executable's initialization and
> > > > forgotten.
> > > 
> > > We victimize the unmapped cache only if it is unused (in LRU order).
> > > We don't force the issue too much. We also have free slab cache to go
> > > after.
> > 
> > Just to be clear, let's say we have a mapped page (say of /sbin/init)
> > that's been unreferenced since _just_ after the system booted.  We also
> > have an unmapped page cache page of a file often used at runtime, say
> > one from /etc/resolv.conf or /etc/passwd.
> > 
> 
> Hmm. I'm not fan of estimating working set size by calculation
> based on some numbers without considering history or feedback.
> 
> Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
> or GA (or somehing more smart) to detect the size ?
>

Could you please clarify at what level you are suggesting size
detection? I assume it is outside the OS, right? 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-11  0:07         ` Dave Hansen
@ 2010-06-11  4:56           ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-11  4:56 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

* Dave Hansen <dave@linux.vnet.ibm.com> [2010-06-10 17:07:32]:

> On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > Shouldn't page selection use the LRU for recency information instead
> > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > cache page can be more expensive than dropping an unused text page
> > > that was loaded as part of some executable's initialization and
> > > forgotten.
> > 
> > We victimize the unmapped cache only if it is unused (in LRU order).
> > We don't force the issue too much. We also have free slab cache to go
> > after.
> 
> Just to be clear, let's say we have a mapped page (say of /sbin/init)
> that's been unreferenced since _just_ after the system booted.  We also
> have an unmapped page cache page of a file often used at runtime, say
> one from /etc/resolv.conf or /etc/passwd.
> 
> Which page will be preferred for eviction with this patch set?
>

In this case the order is as follows

1. First we pick free pages if any
2. If we don't have free pages, we go after unmapped page cache and
slab cache
3. If that fails as well, we go after regularly memory

In the scenario that you describe, we'll not be able to easily free up
the frequently referenced page from /etc/*. The code will move on to
step 3 and do its regular reclaim. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-11  4:56           ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-11  4:56 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

* Dave Hansen <dave@linux.vnet.ibm.com> [2010-06-10 17:07:32]:

> On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > Shouldn't page selection use the LRU for recency information instead
> > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > cache page can be more expensive than dropping an unused text page
> > > that was loaded as part of some executable's initialization and
> > > forgotten.
> > 
> > We victimize the unmapped cache only if it is unused (in LRU order).
> > We don't force the issue too much. We also have free slab cache to go
> > after.
> 
> Just to be clear, let's say we have a mapped page (say of /sbin/init)
> that's been unreferenced since _just_ after the system booted.  We also
> have an unmapped page cache page of a file often used at runtime, say
> one from /etc/resolv.conf or /etc/passwd.
> 
> Which page will be preferred for eviction with this patch set?
>

In this case the order is as follows

1. First we pick free pages if any
2. If we don't have free pages, we go after unmapped page cache and
slab cache
3. If that fails as well, we go after regularly memory

In the scenario that you describe, we'll not be able to easily free up
the frequently referenced page from /etc/*. The code will move on to
step 3 and do its regular reclaim. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-11  4:46             ` Balbir Singh
@ 2010-06-11  5:05               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-11  5:05 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, Avi Kivity, kvm, linux-mm, linux-kernel

On Fri, 11 Jun 2010 10:16:32 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-11 10:54:41]:
> 
> > On Thu, 10 Jun 2010 17:07:32 -0700
> > Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> > 
> > > On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > > > Shouldn't page selection use the LRU for recency information instead
> > > > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > > > cache page can be more expensive than dropping an unused text page
> > > > > that was loaded as part of some executable's initialization and
> > > > > forgotten.
> > > > 
> > > > We victimize the unmapped cache only if it is unused (in LRU order).
> > > > We don't force the issue too much. We also have free slab cache to go
> > > > after.
> > > 
> > > Just to be clear, let's say we have a mapped page (say of /sbin/init)
> > > that's been unreferenced since _just_ after the system booted.  We also
> > > have an unmapped page cache page of a file often used at runtime, say
> > > one from /etc/resolv.conf or /etc/passwd.
> > > 
> > 
> > Hmm. I'm not fan of estimating working set size by calculation
> > based on some numbers without considering history or feedback.
> > 
> > Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
> > or GA (or somehing more smart) to detect the size ?
> >
> 
> Could you please clarify at what level you are suggesting size
> detection? I assume it is outside the OS, right? 
> 
"OS" includes kernel and system programs ;)

I can think of both way in kernel and in user approarh and they should be
complement to each other.

An example of kernel-based approach is.
 1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
 2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
(I guess current balloon driver is only for host. Please imagine.)

(A) increases free memory in Guest.
(B) increases free memory in Host.

This is an example of feedback based memory resizing between host and guest.

I think (B) is necessary at least before considering complecated things.

To implement something clever,  (A) and (B) should take into account that
how frequently memory reclaim in guest (which requires some I/O) happens.

If doing outside kernel, I think using memcg is better than depends on
balloon driver. But co-operative balloon and memcg may show us something
good.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-11  5:05               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-11  5:05 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, Avi Kivity, kvm, linux-mm, linux-kernel

On Fri, 11 Jun 2010 10:16:32 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-11 10:54:41]:
> 
> > On Thu, 10 Jun 2010 17:07:32 -0700
> > Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> > 
> > > On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > > > Shouldn't page selection use the LRU for recency information instead
> > > > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > > > cache page can be more expensive than dropping an unused text page
> > > > > that was loaded as part of some executable's initialization and
> > > > > forgotten.
> > > > 
> > > > We victimize the unmapped cache only if it is unused (in LRU order).
> > > > We don't force the issue too much. We also have free slab cache to go
> > > > after.
> > > 
> > > Just to be clear, let's say we have a mapped page (say of /sbin/init)
> > > that's been unreferenced since _just_ after the system booted.  We also
> > > have an unmapped page cache page of a file often used at runtime, say
> > > one from /etc/resolv.conf or /etc/passwd.
> > > 
> > 
> > Hmm. I'm not fan of estimating working set size by calculation
> > based on some numbers without considering history or feedback.
> > 
> > Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
> > or GA (or somehing more smart) to detect the size ?
> >
> 
> Could you please clarify at what level you are suggesting size
> detection? I assume it is outside the OS, right? 
> 
"OS" includes kernel and system programs ;)

I can think of both way in kernel and in user approarh and they should be
complement to each other.

An example of kernel-based approach is.
 1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
 2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
(I guess current balloon driver is only for host. Please imagine.)

(A) increases free memory in Guest.
(B) increases free memory in Host.

This is an example of feedback based memory resizing between host and guest.

I think (B) is necessary at least before considering complecated things.

To implement something clever,  (A) and (B) should take into account that
how frequently memory reclaim in guest (which requires some I/O) happens.

If doing outside kernel, I think using memcg is better than depends on
balloon driver. But co-operative balloon and memcg may show us something
good.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-11  5:05               ` KAMEZAWA Hiroyuki
@ 2010-06-11  5:08                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-11  5:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: balbir, Dave Hansen, Avi Kivity, kvm, linux-mm, linux-kernel

On Fri, 11 Jun 2010 14:05:53 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> I can think of both way in kernel and in user approarh and they should be
> complement to each other.
> 
> An example of kernel-based approach is.
>  1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
>  2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
> (I guess current balloon driver is only for host. Please imagine.)
                                              ^^^^
                                              guest.
Sorry.
-Kame


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-11  5:08                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-11  5:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: balbir, Dave Hansen, Avi Kivity, kvm, linux-mm, linux-kernel

On Fri, 11 Jun 2010 14:05:53 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> I can think of both way in kernel and in user approarh and they should be
> complement to each other.
> 
> An example of kernel-based approach is.
>  1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
>  2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
> (I guess current balloon driver is only for host. Please imagine.)
                                              ^^^^
                                              guest.
Sorry.
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-11  5:05               ` KAMEZAWA Hiroyuki
@ 2010-06-11  6:14                 ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-11  6:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, Avi Kivity, kvm, linux-mm, linux-kernel

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-11 14:05:53]:

> On Fri, 11 Jun 2010 10:16:32 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-11 10:54:41]:
> > 
> > > On Thu, 10 Jun 2010 17:07:32 -0700
> > > Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> > > 
> > > > On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > > > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > > > > Shouldn't page selection use the LRU for recency information instead
> > > > > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > > > > cache page can be more expensive than dropping an unused text page
> > > > > > that was loaded as part of some executable's initialization and
> > > > > > forgotten.
> > > > > 
> > > > > We victimize the unmapped cache only if it is unused (in LRU order).
> > > > > We don't force the issue too much. We also have free slab cache to go
> > > > > after.
> > > > 
> > > > Just to be clear, let's say we have a mapped page (say of /sbin/init)
> > > > that's been unreferenced since _just_ after the system booted.  We also
> > > > have an unmapped page cache page of a file often used at runtime, say
> > > > one from /etc/resolv.conf or /etc/passwd.
> > > > 
> > > 
> > > Hmm. I'm not fan of estimating working set size by calculation
> > > based on some numbers without considering history or feedback.
> > > 
> > > Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
> > > or GA (or somehing more smart) to detect the size ?
> > >
> > 
> > Could you please clarify at what level you are suggesting size
> > detection? I assume it is outside the OS, right? 
> > 
> "OS" includes kernel and system programs ;)
> 
> I can think of both way in kernel and in user approarh and they should be
> complement to each other.
> 
> An example of kernel-based approach is.
>  1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
>  2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
> (I guess current balloon driver is only for host. Please imagine.)
> 
> (A) increases free memory in Guest.
> (B) increases free memory in Host.
> 
> This is an example of feedback based memory resizing between host and guest.
> 
> I think (B) is necessary at least before considering complecated things.

B is left to the hypervisor and the memory policy running on it. My
patches address Linux running as a guest, with a Linux hypervisor at
the moment, but that can be extended to other balloon drivers as well.

> 
> To implement something clever,  (A) and (B) should take into account that
> how frequently memory reclaim in guest (which requires some I/O) happens.
> 

Yes, I think the policy in the hypervisor needs to look at those
details as well.

> If doing outside kernel, I think using memcg is better than depends on
> balloon driver. But co-operative balloon and memcg may show us something
> good.
> 

Yes, agreed. Co-operative is better, if there is no co-operation than
memcg might be used for enforcement.

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-11  6:14                 ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-11  6:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, Avi Kivity, kvm, linux-mm, linux-kernel

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-11 14:05:53]:

> On Fri, 11 Jun 2010 10:16:32 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-11 10:54:41]:
> > 
> > > On Thu, 10 Jun 2010 17:07:32 -0700
> > > Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> > > 
> > > > On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > > > > > I'm not sure victimizing unmapped cache pages is a good idea.
> > > > > > Shouldn't page selection use the LRU for recency information instead
> > > > > > of the cost of guest reclaim?  Dropping a frequently used unmapped
> > > > > > cache page can be more expensive than dropping an unused text page
> > > > > > that was loaded as part of some executable's initialization and
> > > > > > forgotten.
> > > > > 
> > > > > We victimize the unmapped cache only if it is unused (in LRU order).
> > > > > We don't force the issue too much. We also have free slab cache to go
> > > > > after.
> > > > 
> > > > Just to be clear, let's say we have a mapped page (say of /sbin/init)
> > > > that's been unreferenced since _just_ after the system booted.  We also
> > > > have an unmapped page cache page of a file often used at runtime, say
> > > > one from /etc/resolv.conf or /etc/passwd.
> > > > 
> > > 
> > > Hmm. I'm not fan of estimating working set size by calculation
> > > based on some numbers without considering history or feedback.
> > > 
> > > Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
> > > or GA (or somehing more smart) to detect the size ?
> > >
> > 
> > Could you please clarify at what level you are suggesting size
> > detection? I assume it is outside the OS, right? 
> > 
> "OS" includes kernel and system programs ;)
> 
> I can think of both way in kernel and in user approarh and they should be
> complement to each other.
> 
> An example of kernel-based approach is.
>  1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
>  2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
> (I guess current balloon driver is only for host. Please imagine.)
> 
> (A) increases free memory in Guest.
> (B) increases free memory in Host.
> 
> This is an example of feedback based memory resizing between host and guest.
> 
> I think (B) is necessary at least before considering complecated things.

B is left to the hypervisor and the memory policy running on it. My
patches address Linux running as a guest, with a Linux hypervisor at
the moment, but that can be extended to other balloon drivers as well.

> 
> To implement something clever,  (A) and (B) should take into account that
> how frequently memory reclaim in guest (which requires some I/O) happens.
> 

Yes, I think the policy in the hypervisor needs to look at those
details as well.

> If doing outside kernel, I think using memcg is better than depends on
> balloon driver. But co-operative balloon and memcg may show us something
> good.
> 

Yes, agreed. Co-operative is better, if there is no co-operation than
memcg might be used for enforcement.

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
  2010-06-08 15:51   ` Balbir Singh
@ 2010-06-13 18:31     ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-13 18:31 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, linux-mm, linux-kernel

* Balbir Singh <balbir@linux.vnet.ibm.com> [2010-06-08 21:21:46]:

> Selectively control Unmapped Page Cache (nospam version)
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> This patch implements unmapped page cache control via preferred
> page cache reclaim. The current patch hooks into kswapd and reclaims
> page cache if the user has requested for unmapped page control.
> This is useful in the following scenario
> 
> - In a virtualized environment with cache=writethrough, we see
>   double caching - (one in the host and one in the guest). As
>   we try to scale guests, cache usage across the system grows.
>   The goal of this patch is to reclaim page cache when Linux is running
>   as a guest and get the host to hold the page cache and manage it.
>   There might be temporary duplication, but in the long run, memory
>   in the guests would be used for mapped pages.
> - The option is controlled via a boot option and the administrator
>   can selectively turn it on, on a need to use basis.
> 
> A lot of the code is borrowed from zone_reclaim_mode logic for
> __zone_reclaim(). One might argue that the with ballooning and
> KSM this feature is not very useful, but even with ballooning,
> we need extra logic to balloon multiple VM machines and it is hard
> to figure out the correct amount of memory to balloon. With these
> patches applied, each guest has a sufficient amount of free memory
> available, that can be easily seen and reclaimed by the balloon driver.
> The additional memory in the guest can be reused for additional
> applications or used to start additional guests/balance memory in
> the host.
> 
> KSM currently does not de-duplicate host and guest page cache. The goal
> of this patch is to help automatically balance unmapped page cache when
> instructed to do so.
> 
> There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
> and the number of pages to reclaim when unmapped_page_control argument
> is supplied. These numbers were chosen to avoid aggressiveness in
> reaping page cache ever so frequently, at the same time providing control.
> 
> The sysctl for min_unmapped_ratio provides further control from
> within the guest on the amount of unmapped pages to reclaim.
>

Are there any major objections to this patch?
 
-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
@ 2010-06-13 18:31     ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-13 18:31 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, linux-mm, linux-kernel

* Balbir Singh <balbir@linux.vnet.ibm.com> [2010-06-08 21:21:46]:

> Selectively control Unmapped Page Cache (nospam version)
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> This patch implements unmapped page cache control via preferred
> page cache reclaim. The current patch hooks into kswapd and reclaims
> page cache if the user has requested for unmapped page control.
> This is useful in the following scenario
> 
> - In a virtualized environment with cache=writethrough, we see
>   double caching - (one in the host and one in the guest). As
>   we try to scale guests, cache usage across the system grows.
>   The goal of this patch is to reclaim page cache when Linux is running
>   as a guest and get the host to hold the page cache and manage it.
>   There might be temporary duplication, but in the long run, memory
>   in the guests would be used for mapped pages.
> - The option is controlled via a boot option and the administrator
>   can selectively turn it on, on a need to use basis.
> 
> A lot of the code is borrowed from zone_reclaim_mode logic for
> __zone_reclaim(). One might argue that the with ballooning and
> KSM this feature is not very useful, but even with ballooning,
> we need extra logic to balloon multiple VM machines and it is hard
> to figure out the correct amount of memory to balloon. With these
> patches applied, each guest has a sufficient amount of free memory
> available, that can be easily seen and reclaimed by the balloon driver.
> The additional memory in the guest can be reused for additional
> applications or used to start additional guests/balance memory in
> the host.
> 
> KSM currently does not de-duplicate host and guest page cache. The goal
> of this patch is to help automatically balance unmapped page cache when
> instructed to do so.
> 
> There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
> and the number of pages to reclaim when unmapped_page_control argument
> is supplied. These numbers were chosen to avoid aggressiveness in
> reaping page cache ever so frequently, at the same time providing control.
> 
> The sysctl for min_unmapped_ratio provides further control from
> within the guest on the amount of unmapped pages to reclaim.
>

Are there any major objections to this patch?
 
-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
  2010-06-13 18:31     ` Balbir Singh
@ 2010-06-14  0:28       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-14  0:28 UTC (permalink / raw)
  To: balbir; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

On Mon, 14 Jun 2010 00:01:45 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-06-08 21:21:46]:
> 
> > Selectively control Unmapped Page Cache (nospam version)
> > 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > This patch implements unmapped page cache control via preferred
> > page cache reclaim. The current patch hooks into kswapd and reclaims
> > page cache if the user has requested for unmapped page control.
> > This is useful in the following scenario
> > 
> > - In a virtualized environment with cache=writethrough, we see
> >   double caching - (one in the host and one in the guest). As
> >   we try to scale guests, cache usage across the system grows.
> >   The goal of this patch is to reclaim page cache when Linux is running
> >   as a guest and get the host to hold the page cache and manage it.
> >   There might be temporary duplication, but in the long run, memory
> >   in the guests would be used for mapped pages.
> > - The option is controlled via a boot option and the administrator
> >   can selectively turn it on, on a need to use basis.
> > 
> > A lot of the code is borrowed from zone_reclaim_mode logic for
> > __zone_reclaim(). One might argue that the with ballooning and
> > KSM this feature is not very useful, but even with ballooning,
> > we need extra logic to balloon multiple VM machines and it is hard
> > to figure out the correct amount of memory to balloon. With these
> > patches applied, each guest has a sufficient amount of free memory
> > available, that can be easily seen and reclaimed by the balloon driver.
> > The additional memory in the guest can be reused for additional
> > applications or used to start additional guests/balance memory in
> > the host.
> > 
> > KSM currently does not de-duplicate host and guest page cache. The goal
> > of this patch is to help automatically balance unmapped page cache when
> > instructed to do so.
> > 
> > There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
> > and the number of pages to reclaim when unmapped_page_control argument
> > is supplied. These numbers were chosen to avoid aggressiveness in
> > reaping page cache ever so frequently, at the same time providing control.
> > 
> > The sysctl for min_unmapped_ratio provides further control from
> > within the guest on the amount of unmapped pages to reclaim.
> >
> 
> Are there any major objections to this patch?
>  

This kind of patch needs "how it works well" measurement.

- How did you measure the effect of the patch ? kernbench is not enough, of course.
- Why don't you believe LRU ? And if LRU doesn't work well, should it be
  fixed by a knob rather than generic approach ?
- No side effects ?

- Linux vm guys tend to say, "free memory is bad memory". ok, for what
  free memory created by your patch is used ? IOW, I can't see the benefit.
  If free memory that your patch created will be used for another page-cache,
  it will be dropped soon by your patch itself.

  If your patch just drops "duplicated, but no more necessary for other kvm",
  I agree your patch may increase available size of page-caches. But you just
  drops unmapped pages.
  Hmm.

Thanks,
-Kame
 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
@ 2010-06-14  0:28       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-14  0:28 UTC (permalink / raw)
  To: balbir; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

On Mon, 14 Jun 2010 00:01:45 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-06-08 21:21:46]:
> 
> > Selectively control Unmapped Page Cache (nospam version)
> > 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > This patch implements unmapped page cache control via preferred
> > page cache reclaim. The current patch hooks into kswapd and reclaims
> > page cache if the user has requested for unmapped page control.
> > This is useful in the following scenario
> > 
> > - In a virtualized environment with cache=writethrough, we see
> >   double caching - (one in the host and one in the guest). As
> >   we try to scale guests, cache usage across the system grows.
> >   The goal of this patch is to reclaim page cache when Linux is running
> >   as a guest and get the host to hold the page cache and manage it.
> >   There might be temporary duplication, but in the long run, memory
> >   in the guests would be used for mapped pages.
> > - The option is controlled via a boot option and the administrator
> >   can selectively turn it on, on a need to use basis.
> > 
> > A lot of the code is borrowed from zone_reclaim_mode logic for
> > __zone_reclaim(). One might argue that the with ballooning and
> > KSM this feature is not very useful, but even with ballooning,
> > we need extra logic to balloon multiple VM machines and it is hard
> > to figure out the correct amount of memory to balloon. With these
> > patches applied, each guest has a sufficient amount of free memory
> > available, that can be easily seen and reclaimed by the balloon driver.
> > The additional memory in the guest can be reused for additional
> > applications or used to start additional guests/balance memory in
> > the host.
> > 
> > KSM currently does not de-duplicate host and guest page cache. The goal
> > of this patch is to help automatically balance unmapped page cache when
> > instructed to do so.
> > 
> > There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
> > and the number of pages to reclaim when unmapped_page_control argument
> > is supplied. These numbers were chosen to avoid aggressiveness in
> > reaping page cache ever so frequently, at the same time providing control.
> > 
> > The sysctl for min_unmapped_ratio provides further control from
> > within the guest on the amount of unmapped pages to reclaim.
> >
> 
> Are there any major objections to this patch?
>  

This kind of patch needs "how it works well" measurement.

- How did you measure the effect of the patch ? kernbench is not enough, of course.
- Why don't you believe LRU ? And if LRU doesn't work well, should it be
  fixed by a knob rather than generic approach ?
- No side effects ?

- Linux vm guys tend to say, "free memory is bad memory". ok, for what
  free memory created by your patch is used ? IOW, I can't see the benefit.
  If free memory that your patch created will be used for another page-cache,
  it will be dropped soon by your patch itself.

  If your patch just drops "duplicated, but no more necessary for other kvm",
  I agree your patch may increase available size of page-caches. But you just
  drops unmapped pages.
  Hmm.

Thanks,
-Kame
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
  2010-06-14  0:28       ` KAMEZAWA Hiroyuki
@ 2010-06-14  6:49         ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14  6:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-14 09:28:19]:

> On Mon, 14 Jun 2010 00:01:45 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-06-08 21:21:46]:
> > 
> > > Selectively control Unmapped Page Cache (nospam version)
> > > 
> > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > 
> > > This patch implements unmapped page cache control via preferred
> > > page cache reclaim. The current patch hooks into kswapd and reclaims
> > > page cache if the user has requested for unmapped page control.
> > > This is useful in the following scenario
> > > 
> > > - In a virtualized environment with cache=writethrough, we see
> > >   double caching - (one in the host and one in the guest). As
> > >   we try to scale guests, cache usage across the system grows.
> > >   The goal of this patch is to reclaim page cache when Linux is running
> > >   as a guest and get the host to hold the page cache and manage it.
> > >   There might be temporary duplication, but in the long run, memory
> > >   in the guests would be used for mapped pages.
> > > - The option is controlled via a boot option and the administrator
> > >   can selectively turn it on, on a need to use basis.
> > > 
> > > A lot of the code is borrowed from zone_reclaim_mode logic for
> > > __zone_reclaim(). One might argue that the with ballooning and
> > > KSM this feature is not very useful, but even with ballooning,
> > > we need extra logic to balloon multiple VM machines and it is hard
> > > to figure out the correct amount of memory to balloon. With these
> > > patches applied, each guest has a sufficient amount of free memory
> > > available, that can be easily seen and reclaimed by the balloon driver.
> > > The additional memory in the guest can be reused for additional
> > > applications or used to start additional guests/balance memory in
> > > the host.
> > > 
> > > KSM currently does not de-duplicate host and guest page cache. The goal
> > > of this patch is to help automatically balance unmapped page cache when
> > > instructed to do so.
> > > 
> > > There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
> > > and the number of pages to reclaim when unmapped_page_control argument
> > > is supplied. These numbers were chosen to avoid aggressiveness in
> > > reaping page cache ever so frequently, at the same time providing control.
> > > 
> > > The sysctl for min_unmapped_ratio provides further control from
> > > within the guest on the amount of unmapped pages to reclaim.
> > >
> > 
> > Are there any major objections to this patch?
> >  
> 
> This kind of patch needs "how it works well" measurement.
> 
> - How did you measure the effect of the patch ? kernbench is not enough, of course.

I can run other benchmarks as well, I will do so

> - Why don't you believe LRU ? And if LRU doesn't work well, should it be
>   fixed by a knob rather than generic approach ?
> - No side effects ?

I believe in LRU, just that the problem I am trying to solve is of
using double the memory for caching the same data (consider kvm
running in cache=writethrough or writeback mode, both the hypervisor
and the guest OS maintain a page cache of the same data). As the VM's
grow the overhead is substantial. In my runs I found upto 60%
duplication in some cases.

> 
> - Linux vm guys tend to say, "free memory is bad memory". ok, for what
>   free memory created by your patch is used ? IOW, I can't see the benefit.
>   If free memory that your patch created will be used for another page-cache,
>   it will be dropped soon by your patch itself.
> 

Free memory is good for cases when you want to do more in the same
system. I agree that in a bare metail environment that might be
partially true. I don't have a problem with frequently used data being
cached, but I am targetting a consolidated environment at the moment.
Moreover, the administrator has control via a boot option, so it is
non-instrusive in many ways.

>   If your patch just drops "duplicated, but no more necessary for other kvm",
>   I agree your patch may increase available size of page-caches. But you just
>   drops unmapped pages.
>

unmapped and unused are the best targets, I plan to add slab cache control later. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
@ 2010-06-14  6:49         ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14  6:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-14 09:28:19]:

> On Mon, 14 Jun 2010 00:01:45 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-06-08 21:21:46]:
> > 
> > > Selectively control Unmapped Page Cache (nospam version)
> > > 
> > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > 
> > > This patch implements unmapped page cache control via preferred
> > > page cache reclaim. The current patch hooks into kswapd and reclaims
> > > page cache if the user has requested for unmapped page control.
> > > This is useful in the following scenario
> > > 
> > > - In a virtualized environment with cache=writethrough, we see
> > >   double caching - (one in the host and one in the guest). As
> > >   we try to scale guests, cache usage across the system grows.
> > >   The goal of this patch is to reclaim page cache when Linux is running
> > >   as a guest and get the host to hold the page cache and manage it.
> > >   There might be temporary duplication, but in the long run, memory
> > >   in the guests would be used for mapped pages.
> > > - The option is controlled via a boot option and the administrator
> > >   can selectively turn it on, on a need to use basis.
> > > 
> > > A lot of the code is borrowed from zone_reclaim_mode logic for
> > > __zone_reclaim(). One might argue that the with ballooning and
> > > KSM this feature is not very useful, but even with ballooning,
> > > we need extra logic to balloon multiple VM machines and it is hard
> > > to figure out the correct amount of memory to balloon. With these
> > > patches applied, each guest has a sufficient amount of free memory
> > > available, that can be easily seen and reclaimed by the balloon driver.
> > > The additional memory in the guest can be reused for additional
> > > applications or used to start additional guests/balance memory in
> > > the host.
> > > 
> > > KSM currently does not de-duplicate host and guest page cache. The goal
> > > of this patch is to help automatically balance unmapped page cache when
> > > instructed to do so.
> > > 
> > > There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
> > > and the number of pages to reclaim when unmapped_page_control argument
> > > is supplied. These numbers were chosen to avoid aggressiveness in
> > > reaping page cache ever so frequently, at the same time providing control.
> > > 
> > > The sysctl for min_unmapped_ratio provides further control from
> > > within the guest on the amount of unmapped pages to reclaim.
> > >
> > 
> > Are there any major objections to this patch?
> >  
> 
> This kind of patch needs "how it works well" measurement.
> 
> - How did you measure the effect of the patch ? kernbench is not enough, of course.

I can run other benchmarks as well, I will do so

> - Why don't you believe LRU ? And if LRU doesn't work well, should it be
>   fixed by a knob rather than generic approach ?
> - No side effects ?

I believe in LRU, just that the problem I am trying to solve is of
using double the memory for caching the same data (consider kvm
running in cache=writethrough or writeback mode, both the hypervisor
and the guest OS maintain a page cache of the same data). As the VM's
grow the overhead is substantial. In my runs I found upto 60%
duplication in some cases.

> 
> - Linux vm guys tend to say, "free memory is bad memory". ok, for what
>   free memory created by your patch is used ? IOW, I can't see the benefit.
>   If free memory that your patch created will be used for another page-cache,
>   it will be dropped soon by your patch itself.
> 

Free memory is good for cases when you want to do more in the same
system. I agree that in a bare metail environment that might be
partially true. I don't have a problem with frequently used data being
cached, but I am targetting a consolidated environment at the moment.
Moreover, the administrator has control via a boot option, so it is
non-instrusive in many ways.

>   If your patch just drops "duplicated, but no more necessary for other kvm",
>   I agree your patch may increase available size of page-caches. But you just
>   drops unmapped pages.
>

unmapped and unused are the best targets, I plan to add slab cache control later. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
  2010-06-14  6:49         ` Balbir Singh
@ 2010-06-14  7:00           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-14  7:00 UTC (permalink / raw)
  To: balbir; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

On Mon, 14 Jun 2010 12:19:55 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > - Why don't you believe LRU ? And if LRU doesn't work well, should it be
> >   fixed by a knob rather than generic approach ?
> > - No side effects ?
> 
> I believe in LRU, just that the problem I am trying to solve is of
> using double the memory for caching the same data (consider kvm
> running in cache=writethrough or writeback mode, both the hypervisor
> and the guest OS maintain a page cache of the same data). As the VM's
> grow the overhead is substantial. In my runs I found upto 60%
> duplication in some cases.
> 
> 
> - Linux vm guys tend to say, "free memory is bad memory". ok, for what
>   free memory created by your patch is used ? IOW, I can't see the benefit.
>   If free memory that your patch created will be used for another page-cache,
>   it will be dropped soon by your patch itself.
> 
> Free memory is good for cases when you want to do more in the same
> system. I agree that in a bare metail environment that might be
> partially true. I don't have a problem with frequently used data being
> cached, but I am targetting a consolidated environment at the moment.
> Moreover, the administrator has control via a boot option, so it is
> non-instrusive in many ways.

It sounds that what you want is to improve performance etc. but to make it
easy sizing the system and to help admins. Right ?

>From performance perspective, I don't see any advantage to drop caches
which can be dropped easily. I just use cpus for the purpose it may no
be necessary.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
@ 2010-06-14  7:00           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-14  7:00 UTC (permalink / raw)
  To: balbir; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

On Mon, 14 Jun 2010 12:19:55 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > - Why don't you believe LRU ? And if LRU doesn't work well, should it be
> >   fixed by a knob rather than generic approach ?
> > - No side effects ?
> 
> I believe in LRU, just that the problem I am trying to solve is of
> using double the memory for caching the same data (consider kvm
> running in cache=writethrough or writeback mode, both the hypervisor
> and the guest OS maintain a page cache of the same data). As the VM's
> grow the overhead is substantial. In my runs I found upto 60%
> duplication in some cases.
> 
> 
> - Linux vm guys tend to say, "free memory is bad memory". ok, for what
>   free memory created by your patch is used ? IOW, I can't see the benefit.
>   If free memory that your patch created will be used for another page-cache,
>   it will be dropped soon by your patch itself.
> 
> Free memory is good for cases when you want to do more in the same
> system. I agree that in a bare metail environment that might be
> partially true. I don't have a problem with frequently used data being
> cached, but I am targetting a consolidated environment at the moment.
> Moreover, the administrator has control via a boot option, so it is
> non-instrusive in many ways.

It sounds that what you want is to improve performance etc. but to make it
easy sizing the system and to help admins. Right ?

>From performance perspective, I don't see any advantage to drop caches
which can be dropped easily. I just use cpus for the purpose it may no
be necessary.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
  2010-06-14  7:00           ` KAMEZAWA Hiroyuki
@ 2010-06-14  7:36             ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14  7:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-14 16:00:21]:

> On Mon, 14 Jun 2010 12:19:55 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > - Why don't you believe LRU ? And if LRU doesn't work well, should it be
> > >   fixed by a knob rather than generic approach ?
> > > - No side effects ?
> > 
> > I believe in LRU, just that the problem I am trying to solve is of
> > using double the memory for caching the same data (consider kvm
> > running in cache=writethrough or writeback mode, both the hypervisor
> > and the guest OS maintain a page cache of the same data). As the VM's
> > grow the overhead is substantial. In my runs I found upto 60%
> > duplication in some cases.
> > 
> > 
> > - Linux vm guys tend to say, "free memory is bad memory". ok, for what
> >   free memory created by your patch is used ? IOW, I can't see the benefit.
> >   If free memory that your patch created will be used for another page-cache,
> >   it will be dropped soon by your patch itself.
> > 
> > Free memory is good for cases when you want to do more in the same
> > system. I agree that in a bare metail environment that might be
> > partially true. I don't have a problem with frequently used data being
> > cached, but I am targetting a consolidated environment at the moment.
> > Moreover, the administrator has control via a boot option, so it is
> > non-instrusive in many ways.
> 
> It sounds that what you want is to improve performance etc. but to make it
> easy sizing the system and to help admins. Right ?
>

Right, to allow freeing up of using double the memory to cache data.
 
> From performance perspective, I don't see any advantage to drop caches
> which can be dropped easily. I just use cpus for the purpose it may no
> be necessary.
> 

It is not that easy, in a virtualized environment, you do directly
reclaim, but use a mechanism like ballooning and that too requires a
smart software to decide where to balloon from. This patch (optionally
if enabled) optimizes that by

1. Reducing double caching
2. Not requiring newer smarts or a management software to monitor and
balloon
3. Allows better estimation of free memory by avoiding double caching
4. Allows immediate use of free memory for other applications or
startup of newer guest instances.

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
@ 2010-06-14  7:36             ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14  7:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-14 16:00:21]:

> On Mon, 14 Jun 2010 12:19:55 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > - Why don't you believe LRU ? And if LRU doesn't work well, should it be
> > >   fixed by a knob rather than generic approach ?
> > > - No side effects ?
> > 
> > I believe in LRU, just that the problem I am trying to solve is of
> > using double the memory for caching the same data (consider kvm
> > running in cache=writethrough or writeback mode, both the hypervisor
> > and the guest OS maintain a page cache of the same data). As the VM's
> > grow the overhead is substantial. In my runs I found upto 60%
> > duplication in some cases.
> > 
> > 
> > - Linux vm guys tend to say, "free memory is bad memory". ok, for what
> >   free memory created by your patch is used ? IOW, I can't see the benefit.
> >   If free memory that your patch created will be used for another page-cache,
> >   it will be dropped soon by your patch itself.
> > 
> > Free memory is good for cases when you want to do more in the same
> > system. I agree that in a bare metail environment that might be
> > partially true. I don't have a problem with frequently used data being
> > cached, but I am targetting a consolidated environment at the moment.
> > Moreover, the administrator has control via a boot option, so it is
> > non-instrusive in many ways.
> 
> It sounds that what you want is to improve performance etc. but to make it
> easy sizing the system and to help admins. Right ?
>

Right, to allow freeing up of using double the memory to cache data.
 
> From performance perspective, I don't see any advantage to drop caches
> which can be dropped easily. I just use cpus for the purpose it may no
> be necessary.
> 

It is not that easy, in a virtualized environment, you do directly
reclaim, but use a mechanism like ballooning and that too requires a
smart software to decide where to balloon from. This patch (optionally
if enabled) optimizes that by

1. Reducing double caching
2. Not requiring newer smarts or a management software to monitor and
balloon
3. Allows better estimation of free memory by avoiding double caching
4. Allows immediate use of free memory for other applications or
startup of newer guest instances.

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
  2010-06-14  7:36             ` Balbir Singh
@ 2010-06-14  7:49               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-14  7:49 UTC (permalink / raw)
  To: balbir; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

On Mon, 14 Jun 2010 13:06:46 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
 
> > It sounds that what you want is to improve performance etc. but to make it
> > easy sizing the system and to help admins. Right ?
> >
> 
> Right, to allow freeing up of using double the memory to cache data.
>  
Oh, sorry. ask again..

It sounds that what you want is _not_ to improve performance etc. but to make it
...

?

-Kame


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control
@ 2010-06-14  7:49               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 96+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-14  7:49 UTC (permalink / raw)
  To: balbir; +Cc: kvm, Avi Kivity, linux-mm, linux-kernel

On Mon, 14 Jun 2010 13:06:46 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
 
> > It sounds that what you want is to improve performance etc. but to make it
> > easy sizing the system and to help admins. Right ?
> >
> 
> Right, to allow freeing up of using double the memory to cache data.
>  
Oh, sorry. ask again..

It sounds that what you want is _not_ to improve performance etc. but to make it
...

?

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-11  4:56           ` Balbir Singh
@ 2010-06-14  8:09             ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14  8:09 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/11/2010 07:56 AM, Balbir Singh wrote:
>
>> Just to be clear, let's say we have a mapped page (say of /sbin/init)
>> that's been unreferenced since _just_ after the system booted.  We also
>> have an unmapped page cache page of a file often used at runtime, say
>> one from /etc/resolv.conf or /etc/passwd.
>>
>> Which page will be preferred for eviction with this patch set?
>>
>>      
> In this case the order is as follows
>
> 1. First we pick free pages if any
> 2. If we don't have free pages, we go after unmapped page cache and
> slab cache
> 3. If that fails as well, we go after regularly memory
>
> In the scenario that you describe, we'll not be able to easily free up
> the frequently referenced page from /etc/*. The code will move on to
> step 3 and do its regular reclaim.
>    

Still it seems to me you are subverting the normal order of reclaim.  I 
don't see why an unmapped page cache or slab cache item should be 
evicted before a mapped page.  Certainly the cost of rebuilding a dentry 
compared to the gain from evicting it, is much higher than that of 
reestablishing a mapped page.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14  8:09             ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14  8:09 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/11/2010 07:56 AM, Balbir Singh wrote:
>
>> Just to be clear, let's say we have a mapped page (say of /sbin/init)
>> that's been unreferenced since _just_ after the system booted.  We also
>> have an unmapped page cache page of a file often used at runtime, say
>> one from /etc/resolv.conf or /etc/passwd.
>>
>> Which page will be preferred for eviction with this patch set?
>>
>>      
> In this case the order is as follows
>
> 1. First we pick free pages if any
> 2. If we don't have free pages, we go after unmapped page cache and
> slab cache
> 3. If that fails as well, we go after regularly memory
>
> In the scenario that you describe, we'll not be able to easily free up
> the frequently referenced page from /etc/*. The code will move on to
> step 3 and do its regular reclaim.
>    

Still it seems to me you are subverting the normal order of reclaim.  I 
don't see why an unmapped page cache or slab cache item should be 
evicted before a mapped page.  Certainly the cost of rebuilding a dentry 
compared to the gain from evicting it, is much higher than that of 
reestablishing a mapped page.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14  8:09             ` Avi Kivity
@ 2010-06-14  8:48               ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14  8:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-14 11:09:44]:

> On 06/11/2010 07:56 AM, Balbir Singh wrote:
> >
> >>Just to be clear, let's say we have a mapped page (say of /sbin/init)
> >>that's been unreferenced since _just_ after the system booted.  We also
> >>have an unmapped page cache page of a file often used at runtime, say
> >>one from /etc/resolv.conf or /etc/passwd.
> >>
> >>Which page will be preferred for eviction with this patch set?
> >>
> >In this case the order is as follows
> >
> >1. First we pick free pages if any
> >2. If we don't have free pages, we go after unmapped page cache and
> >slab cache
> >3. If that fails as well, we go after regularly memory
> >
> >In the scenario that you describe, we'll not be able to easily free up
> >the frequently referenced page from /etc/*. The code will move on to
> >step 3 and do its regular reclaim.
> 
> Still it seems to me you are subverting the normal order of reclaim.
> I don't see why an unmapped page cache or slab cache item should be
> evicted before a mapped page.  Certainly the cost of rebuilding a
> dentry compared to the gain from evicting it, is much higher than
> that of reestablishing a mapped page.
>

Subverting to aviod memory duplication, the word subverting is
overloaded, let me try to reason a bit. First let me explain the
problem

Memory is a precious resource in a consolidated environment.
We don't want to waste memory via page cache duplication
(cache=writethrough and cache=writeback mode).

Now here is what we are trying to do

1. A slab page will not be freed until the entire page is free (all
slabs have been kfree'd so to speak). Normal reclaim will definitely
free this page, but a lot of it depends on how frequently we are
scanning the LRU list and when this page got added.
2. In the case of page cache (specifically unmapped page cache), there
is duplication already, so why not go after unmapped page caches when
the system is under memory pressure?

In the case of 1, we don't force a dentry to be freed, but rather a
freed page in the slab cache to be reclaimed ahead of forcing reclaim
of mapped pages.

Does the problem statement make sense? If so, do you agree with 1 and
2? Is there major concern about subverting regular reclaim? Does
subverting it make sense in the duplicated scenario?

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14  8:48               ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14  8:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-14 11:09:44]:

> On 06/11/2010 07:56 AM, Balbir Singh wrote:
> >
> >>Just to be clear, let's say we have a mapped page (say of /sbin/init)
> >>that's been unreferenced since _just_ after the system booted.  We also
> >>have an unmapped page cache page of a file often used at runtime, say
> >>one from /etc/resolv.conf or /etc/passwd.
> >>
> >>Which page will be preferred for eviction with this patch set?
> >>
> >In this case the order is as follows
> >
> >1. First we pick free pages if any
> >2. If we don't have free pages, we go after unmapped page cache and
> >slab cache
> >3. If that fails as well, we go after regularly memory
> >
> >In the scenario that you describe, we'll not be able to easily free up
> >the frequently referenced page from /etc/*. The code will move on to
> >step 3 and do its regular reclaim.
> 
> Still it seems to me you are subverting the normal order of reclaim.
> I don't see why an unmapped page cache or slab cache item should be
> evicted before a mapped page.  Certainly the cost of rebuilding a
> dentry compared to the gain from evicting it, is much higher than
> that of reestablishing a mapped page.
>

Subverting to aviod memory duplication, the word subverting is
overloaded, let me try to reason a bit. First let me explain the
problem

Memory is a precious resource in a consolidated environment.
We don't want to waste memory via page cache duplication
(cache=writethrough and cache=writeback mode).

Now here is what we are trying to do

1. A slab page will not be freed until the entire page is free (all
slabs have been kfree'd so to speak). Normal reclaim will definitely
free this page, but a lot of it depends on how frequently we are
scanning the LRU list and when this page got added.
2. In the case of page cache (specifically unmapped page cache), there
is duplication already, so why not go after unmapped page caches when
the system is under memory pressure?

In the case of 1, we don't force a dentry to be freed, but rather a
freed page in the slab cache to be reclaimed ahead of forcing reclaim
of mapped pages.

Does the problem statement make sense? If so, do you agree with 1 and
2? Is there major concern about subverting regular reclaim? Does
subverting it make sense in the duplicated scenario?

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14  8:48               ` Balbir Singh
@ 2010-06-14 12:40                 ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 12:40 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 11:48 AM, Balbir Singh wrote:
>>>
>>> In this case the order is as follows
>>>
>>> 1. First we pick free pages if any
>>> 2. If we don't have free pages, we go after unmapped page cache and
>>> slab cache
>>> 3. If that fails as well, we go after regularly memory
>>>
>>> In the scenario that you describe, we'll not be able to easily free up
>>> the frequently referenced page from /etc/*. The code will move on to
>>> step 3 and do its regular reclaim.
>>>        
>> Still it seems to me you are subverting the normal order of reclaim.
>> I don't see why an unmapped page cache or slab cache item should be
>> evicted before a mapped page.  Certainly the cost of rebuilding a
>> dentry compared to the gain from evicting it, is much higher than
>> that of reestablishing a mapped page.
>>
>>      
> Subverting to aviod memory duplication, the word subverting is
> overloaded,

Right, should have used a different one.

> let me try to reason a bit. First let me explain the
> problem
>
> Memory is a precious resource in a consolidated environment.
> We don't want to waste memory via page cache duplication
> (cache=writethrough and cache=writeback mode).
>
> Now here is what we are trying to do
>
> 1. A slab page will not be freed until the entire page is free (all
> slabs have been kfree'd so to speak). Normal reclaim will definitely
> free this page, but a lot of it depends on how frequently we are
> scanning the LRU list and when this page got added.
> 2. In the case of page cache (specifically unmapped page cache), there
> is duplication already, so why not go after unmapped page caches when
> the system is under memory pressure?
>
> In the case of 1, we don't force a dentry to be freed, but rather a
> freed page in the slab cache to be reclaimed ahead of forcing reclaim
> of mapped pages.
>    

Sounds like this should be done unconditionally, then.  An empty slab 
page is worth less than an unmapped pagecache page at all times, no?

> Does the problem statement make sense? If so, do you agree with 1 and
> 2? Is there major concern about subverting regular reclaim? Does
> subverting it make sense in the duplicated scenario?
>
>    

In the case of 2, how do you know there is duplication?  You know the 
guest caches the page, but you have no information about the host.  
Since the page is cached in the guest, the host doesn't see it 
referenced, and is likely to drop it.

If there is no duplication, then you may have dropped a recently-used 
page and will likely cause a major fault soon.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 12:40                 ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 12:40 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 11:48 AM, Balbir Singh wrote:
>>>
>>> In this case the order is as follows
>>>
>>> 1. First we pick free pages if any
>>> 2. If we don't have free pages, we go after unmapped page cache and
>>> slab cache
>>> 3. If that fails as well, we go after regularly memory
>>>
>>> In the scenario that you describe, we'll not be able to easily free up
>>> the frequently referenced page from /etc/*. The code will move on to
>>> step 3 and do its regular reclaim.
>>>        
>> Still it seems to me you are subverting the normal order of reclaim.
>> I don't see why an unmapped page cache or slab cache item should be
>> evicted before a mapped page.  Certainly the cost of rebuilding a
>> dentry compared to the gain from evicting it, is much higher than
>> that of reestablishing a mapped page.
>>
>>      
> Subverting to aviod memory duplication, the word subverting is
> overloaded,

Right, should have used a different one.

> let me try to reason a bit. First let me explain the
> problem
>
> Memory is a precious resource in a consolidated environment.
> We don't want to waste memory via page cache duplication
> (cache=writethrough and cache=writeback mode).
>
> Now here is what we are trying to do
>
> 1. A slab page will not be freed until the entire page is free (all
> slabs have been kfree'd so to speak). Normal reclaim will definitely
> free this page, but a lot of it depends on how frequently we are
> scanning the LRU list and when this page got added.
> 2. In the case of page cache (specifically unmapped page cache), there
> is duplication already, so why not go after unmapped page caches when
> the system is under memory pressure?
>
> In the case of 1, we don't force a dentry to be freed, but rather a
> freed page in the slab cache to be reclaimed ahead of forcing reclaim
> of mapped pages.
>    

Sounds like this should be done unconditionally, then.  An empty slab 
page is worth less than an unmapped pagecache page at all times, no?

> Does the problem statement make sense? If so, do you agree with 1 and
> 2? Is there major concern about subverting regular reclaim? Does
> subverting it make sense in the duplicated scenario?
>
>    

In the case of 2, how do you know there is duplication?  You know the 
guest caches the page, but you have no information about the host.  
Since the page is cached in the guest, the host doesn't see it 
referenced, and is likely to drop it.

If there is no duplication, then you may have dropped a recently-used 
page and will likely cause a major fault soon.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 12:40                 ` Avi Kivity
@ 2010-06-14 12:50                   ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 12:50 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-14 15:40:28]:

> On 06/14/2010 11:48 AM, Balbir Singh wrote:
> >>>
> >>>In this case the order is as follows
> >>>
> >>>1. First we pick free pages if any
> >>>2. If we don't have free pages, we go after unmapped page cache and
> >>>slab cache
> >>>3. If that fails as well, we go after regularly memory
> >>>
> >>>In the scenario that you describe, we'll not be able to easily free up
> >>>the frequently referenced page from /etc/*. The code will move on to
> >>>step 3 and do its regular reclaim.
> >>Still it seems to me you are subverting the normal order of reclaim.
> >>I don't see why an unmapped page cache or slab cache item should be
> >>evicted before a mapped page.  Certainly the cost of rebuilding a
> >>dentry compared to the gain from evicting it, is much higher than
> >>that of reestablishing a mapped page.
> >>
> >Subverting to aviod memory duplication, the word subverting is
> >overloaded,
> 
> Right, should have used a different one.
> 
> >let me try to reason a bit. First let me explain the
> >problem
> >
> >Memory is a precious resource in a consolidated environment.
> >We don't want to waste memory via page cache duplication
> >(cache=writethrough and cache=writeback mode).
> >
> >Now here is what we are trying to do
> >
> >1. A slab page will not be freed until the entire page is free (all
> >slabs have been kfree'd so to speak). Normal reclaim will definitely
> >free this page, but a lot of it depends on how frequently we are
> >scanning the LRU list and when this page got added.
> >2. In the case of page cache (specifically unmapped page cache), there
> >is duplication already, so why not go after unmapped page caches when
> >the system is under memory pressure?
> >
> >In the case of 1, we don't force a dentry to be freed, but rather a
> >freed page in the slab cache to be reclaimed ahead of forcing reclaim
> >of mapped pages.
> 
> Sounds like this should be done unconditionally, then.  An empty
> slab page is worth less than an unmapped pagecache page at all
> times, no?
>

In a consolidated environment, even at the cost of some CPU to run
shrinkers, I think potentially yes.
 
> >Does the problem statement make sense? If so, do you agree with 1 and
> >2? Is there major concern about subverting regular reclaim? Does
> >subverting it make sense in the duplicated scenario?
> >
> 
> In the case of 2, how do you know there is duplication?  You know
> the guest caches the page, but you have no information about the
> host.  Since the page is cached in the guest, the host doesn't see
> it referenced, and is likely to drop it.

True, that is why the first patch is controlled via a boot parameter
that the host can pass. For the second patch, I think we'll need
something like a balloon <size> <cache?> with the cache argument being
optional. 

> 
> If there is no duplication, then you may have dropped a
> recently-used page and will likely cause a major fault soon.
>

Yes, agreed. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 12:50                   ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 12:50 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-14 15:40:28]:

> On 06/14/2010 11:48 AM, Balbir Singh wrote:
> >>>
> >>>In this case the order is as follows
> >>>
> >>>1. First we pick free pages if any
> >>>2. If we don't have free pages, we go after unmapped page cache and
> >>>slab cache
> >>>3. If that fails as well, we go after regularly memory
> >>>
> >>>In the scenario that you describe, we'll not be able to easily free up
> >>>the frequently referenced page from /etc/*. The code will move on to
> >>>step 3 and do its regular reclaim.
> >>Still it seems to me you are subverting the normal order of reclaim.
> >>I don't see why an unmapped page cache or slab cache item should be
> >>evicted before a mapped page.  Certainly the cost of rebuilding a
> >>dentry compared to the gain from evicting it, is much higher than
> >>that of reestablishing a mapped page.
> >>
> >Subverting to aviod memory duplication, the word subverting is
> >overloaded,
> 
> Right, should have used a different one.
> 
> >let me try to reason a bit. First let me explain the
> >problem
> >
> >Memory is a precious resource in a consolidated environment.
> >We don't want to waste memory via page cache duplication
> >(cache=writethrough and cache=writeback mode).
> >
> >Now here is what we are trying to do
> >
> >1. A slab page will not be freed until the entire page is free (all
> >slabs have been kfree'd so to speak). Normal reclaim will definitely
> >free this page, but a lot of it depends on how frequently we are
> >scanning the LRU list and when this page got added.
> >2. In the case of page cache (specifically unmapped page cache), there
> >is duplication already, so why not go after unmapped page caches when
> >the system is under memory pressure?
> >
> >In the case of 1, we don't force a dentry to be freed, but rather a
> >freed page in the slab cache to be reclaimed ahead of forcing reclaim
> >of mapped pages.
> 
> Sounds like this should be done unconditionally, then.  An empty
> slab page is worth less than an unmapped pagecache page at all
> times, no?
>

In a consolidated environment, even at the cost of some CPU to run
shrinkers, I think potentially yes.
 
> >Does the problem statement make sense? If so, do you agree with 1 and
> >2? Is there major concern about subverting regular reclaim? Does
> >subverting it make sense in the duplicated scenario?
> >
> 
> In the case of 2, how do you know there is duplication?  You know
> the guest caches the page, but you have no information about the
> host.  Since the page is cached in the guest, the host doesn't see
> it referenced, and is likely to drop it.

True, that is why the first patch is controlled via a boot parameter
that the host can pass. For the second patch, I think we'll need
something like a balloon <size> <cache?> with the cache argument being
optional. 

> 
> If there is no duplication, then you may have dropped a
> recently-used page and will likely cause a major fault soon.
>

Yes, agreed. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 12:50                   ` Balbir Singh
@ 2010-06-14 13:01                     ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 13:01 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 03:50 PM, Balbir Singh wrote:
>
>>
>>> let me try to reason a bit. First let me explain the
>>> problem
>>>
>>> Memory is a precious resource in a consolidated environment.
>>> We don't want to waste memory via page cache duplication
>>> (cache=writethrough and cache=writeback mode).
>>>
>>> Now here is what we are trying to do
>>>
>>> 1. A slab page will not be freed until the entire page is free (all
>>> slabs have been kfree'd so to speak). Normal reclaim will definitely
>>> free this page, but a lot of it depends on how frequently we are
>>> scanning the LRU list and when this page got added.
>>> 2. In the case of page cache (specifically unmapped page cache), there
>>> is duplication already, so why not go after unmapped page caches when
>>> the system is under memory pressure?
>>>
>>> In the case of 1, we don't force a dentry to be freed, but rather a
>>> freed page in the slab cache to be reclaimed ahead of forcing reclaim
>>> of mapped pages.
>>>        
>> Sounds like this should be done unconditionally, then.  An empty
>> slab page is worth less than an unmapped pagecache page at all
>> times, no?
>>
>>      
> In a consolidated environment, even at the cost of some CPU to run
> shrinkers, I think potentially yes.
>    

I don't understand.  If you're running the shrinkers then you're 
evicting live entries, which could cost you an I/O each.  That's 
expensive, consolidated or not.

If you're not running the shrinkers, why does it matter if you're 
consolidated or not?  Drop that age unconditionally.

>>> Does the problem statement make sense? If so, do you agree with 1 and
>>> 2? Is there major concern about subverting regular reclaim? Does
>>> subverting it make sense in the duplicated scenario?
>>>
>>>        
>> In the case of 2, how do you know there is duplication?  You know
>> the guest caches the page, but you have no information about the
>> host.  Since the page is cached in the guest, the host doesn't see
>> it referenced, and is likely to drop it.
>>      
> True, that is why the first patch is controlled via a boot parameter
> that the host can pass. For the second patch, I think we'll need
> something like a balloon<size>  <cache?>  with the cache argument being
> optional.
>    

Whether a page is duplicated on the host or not is per-page, it cannot 
be a boot parameter.

If we drop unmapped pagecache pages, we need to be sure they can be 
backed by the host, and that depends on the amount of sharing.

Overall, I don't see how a user can tune this.  If I were a guest admin, 
I'd play it safe by not assuming the host will back me, and disabling 
the feature.

To get something like this to work, we need to reward cooperating guests 
somehow.

>> If there is no duplication, then you may have dropped a
>> recently-used page and will likely cause a major fault soon.
>>      
> Yes, agreed.
>    

So how do we deal with this?



-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 13:01                     ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 13:01 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 03:50 PM, Balbir Singh wrote:
>
>>
>>> let me try to reason a bit. First let me explain the
>>> problem
>>>
>>> Memory is a precious resource in a consolidated environment.
>>> We don't want to waste memory via page cache duplication
>>> (cache=writethrough and cache=writeback mode).
>>>
>>> Now here is what we are trying to do
>>>
>>> 1. A slab page will not be freed until the entire page is free (all
>>> slabs have been kfree'd so to speak). Normal reclaim will definitely
>>> free this page, but a lot of it depends on how frequently we are
>>> scanning the LRU list and when this page got added.
>>> 2. In the case of page cache (specifically unmapped page cache), there
>>> is duplication already, so why not go after unmapped page caches when
>>> the system is under memory pressure?
>>>
>>> In the case of 1, we don't force a dentry to be freed, but rather a
>>> freed page in the slab cache to be reclaimed ahead of forcing reclaim
>>> of mapped pages.
>>>        
>> Sounds like this should be done unconditionally, then.  An empty
>> slab page is worth less than an unmapped pagecache page at all
>> times, no?
>>
>>      
> In a consolidated environment, even at the cost of some CPU to run
> shrinkers, I think potentially yes.
>    

I don't understand.  If you're running the shrinkers then you're 
evicting live entries, which could cost you an I/O each.  That's 
expensive, consolidated or not.

If you're not running the shrinkers, why does it matter if you're 
consolidated or not?  Drop that age unconditionally.

>>> Does the problem statement make sense? If so, do you agree with 1 and
>>> 2? Is there major concern about subverting regular reclaim? Does
>>> subverting it make sense in the duplicated scenario?
>>>
>>>        
>> In the case of 2, how do you know there is duplication?  You know
>> the guest caches the page, but you have no information about the
>> host.  Since the page is cached in the guest, the host doesn't see
>> it referenced, and is likely to drop it.
>>      
> True, that is why the first patch is controlled via a boot parameter
> that the host can pass. For the second patch, I think we'll need
> something like a balloon<size>  <cache?>  with the cache argument being
> optional.
>    

Whether a page is duplicated on the host or not is per-page, it cannot 
be a boot parameter.

If we drop unmapped pagecache pages, we need to be sure they can be 
backed by the host, and that depends on the amount of sharing.

Overall, I don't see how a user can tune this.  If I were a guest admin, 
I'd play it safe by not assuming the host will back me, and disabling 
the feature.

To get something like this to work, we need to reward cooperating guests 
somehow.

>> If there is no duplication, then you may have dropped a
>> recently-used page and will likely cause a major fault soon.
>>      
> Yes, agreed.
>    

So how do we deal with this?



-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14  8:48               ` Balbir Singh
@ 2010-06-14 15:12                 ` Dave Hansen
  -1 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 15:12 UTC (permalink / raw)
  To: balbir; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
> 1. A slab page will not be freed until the entire page is free (all
> slabs have been kfree'd so to speak). Normal reclaim will definitely
> free this page, but a lot of it depends on how frequently we are
> scanning the LRU list and when this page got added.

You don't have to be freeing entire slab pages for the reclaim to have
been useful.  You could just be making space so that _future_
allocations fill in the slab holes you just created.  You may not be
freeing pages, but you're reducing future system pressure.

If unmapped page cache is the easiest thing to evict, then it should be
the first thing that goes when a balloon request comes in, which is the
case this patch is trying to handle.  If it isn't the easiest thing to
evict, then we _shouldn't_ evict it.

-- Dave



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 15:12                 ` Dave Hansen
  0 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 15:12 UTC (permalink / raw)
  To: balbir; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
> 1. A slab page will not be freed until the entire page is free (all
> slabs have been kfree'd so to speak). Normal reclaim will definitely
> free this page, but a lot of it depends on how frequently we are
> scanning the LRU list and when this page got added.

You don't have to be freeing entire slab pages for the reclaim to have
been useful.  You could just be making space so that _future_
allocations fill in the slab holes you just created.  You may not be
freeing pages, but you're reducing future system pressure.

If unmapped page cache is the easiest thing to evict, then it should be
the first thing that goes when a balloon request comes in, which is the
case this patch is trying to handle.  If it isn't the easiest thing to
evict, then we _shouldn't_ evict it.

-- Dave


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 13:01                     ` Avi Kivity
@ 2010-06-14 15:33                       ` Dave Hansen
  -1 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 15:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: balbir, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 16:01 +0300, Avi Kivity wrote:
> If we drop unmapped pagecache pages, we need to be sure they can be 
> backed by the host, and that depends on the amount of sharing.

You also have to set up the host up properly, and continue to maintain
it in a way that finds and eliminates duplicates.

I saw some benchmarks where KSM was doing great, finding lots of
duplicate pages.  Then, the host filled up, and guests started
reclaiming.  As memory pressure got worse, so did KSM's ability to find
duplicates.

At the same time, I see what you're trying to do with this.  It really
can be an alternative to ballooning if we do it right, since ballooning
would probably evict similar pages.  Although it would only work in idle
guests, what about a knob that the host can turn to just get the guest
to start running reclaim?

-- Dave


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 15:33                       ` Dave Hansen
  0 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 15:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: balbir, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 16:01 +0300, Avi Kivity wrote:
> If we drop unmapped pagecache pages, we need to be sure they can be 
> backed by the host, and that depends on the amount of sharing.

You also have to set up the host up properly, and continue to maintain
it in a way that finds and eliminates duplicates.

I saw some benchmarks where KSM was doing great, finding lots of
duplicate pages.  Then, the host filled up, and guests started
reclaiming.  As memory pressure got worse, so did KSM's ability to find
duplicates.

At the same time, I see what you're trying to do with this.  It really
can be an alternative to ballooning if we do it right, since ballooning
would probably evict similar pages.  Although it would only work in idle
guests, what about a knob that the host can turn to just get the guest
to start running reclaim?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 15:12                 ` Dave Hansen
@ 2010-06-14 15:34                   ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 15:34 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/14/2010 06:12 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
>    
>> 1. A slab page will not be freed until the entire page is free (all
>> slabs have been kfree'd so to speak). Normal reclaim will definitely
>> free this page, but a lot of it depends on how frequently we are
>> scanning the LRU list and when this page got added.
>>      
> You don't have to be freeing entire slab pages for the reclaim to have
> been useful.  You could just be making space so that _future_
> allocations fill in the slab holes you just created.  You may not be
> freeing pages, but you're reducing future system pressure.
>    

Depends.  If you've evicted something that will be referenced soon, 
you're increasing system pressure.

> If unmapped page cache is the easiest thing to evict, then it should be
> the first thing that goes when a balloon request comes in, which is the
> case this patch is trying to handle.  If it isn't the easiest thing to
> evict, then we _shouldn't_ evict it.
>    

Easy to evict is just one measure.  There's benefit (size of data 
evicted), cost to refill (seeks, cpu), and likelihood that the cost to 
refill will be incurred (recency).

It's all very complicated.  We need better information to make these 
decisions.  For one thing, I'd like to see age information tied to 
objects.  We may have two pages that were referenced in wildly different 
times be next to each other in LRU order.  We have many LRUs, but no 
idea of the relative recency of the tails of those LRUs.

If each page or object had an age, we could scale those ages by the 
benefit from reclaim and cost to refill and make a better decision as to 
what to evict first.  But of course page->age means increasing sizeof 
struct page, and we can only approximate its value by scanning the 
accessed bit, not determine it accurately (unlike the other objects 
managed by the cache).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 15:34                   ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 15:34 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/14/2010 06:12 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
>    
>> 1. A slab page will not be freed until the entire page is free (all
>> slabs have been kfree'd so to speak). Normal reclaim will definitely
>> free this page, but a lot of it depends on how frequently we are
>> scanning the LRU list and when this page got added.
>>      
> You don't have to be freeing entire slab pages for the reclaim to have
> been useful.  You could just be making space so that _future_
> allocations fill in the slab holes you just created.  You may not be
> freeing pages, but you're reducing future system pressure.
>    

Depends.  If you've evicted something that will be referenced soon, 
you're increasing system pressure.

> If unmapped page cache is the easiest thing to evict, then it should be
> the first thing that goes when a balloon request comes in, which is the
> case this patch is trying to handle.  If it isn't the easiest thing to
> evict, then we _shouldn't_ evict it.
>    

Easy to evict is just one measure.  There's benefit (size of data 
evicted), cost to refill (seeks, cpu), and likelihood that the cost to 
refill will be incurred (recency).

It's all very complicated.  We need better information to make these 
decisions.  For one thing, I'd like to see age information tied to 
objects.  We may have two pages that were referenced in wildly different 
times be next to each other in LRU order.  We have many LRUs, but no 
idea of the relative recency of the tails of those LRUs.

If each page or object had an age, we could scale those ages by the 
benefit from reclaim and cost to refill and make a better decision as to 
what to evict first.  But of course page->age means increasing sizeof 
struct page, and we can only approximate its value by scanning the 
accessed bit, not determine it accurately (unlike the other objects 
managed by the cache).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 15:33                       ` Dave Hansen
@ 2010-06-14 15:44                         ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 15:44 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/14/2010 06:33 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 16:01 +0300, Avi Kivity wrote:
>    
>> If we drop unmapped pagecache pages, we need to be sure they can be
>> backed by the host, and that depends on the amount of sharing.
>>      
> You also have to set up the host up properly, and continue to maintain
> it in a way that finds and eliminates duplicates.
>
> I saw some benchmarks where KSM was doing great, finding lots of
> duplicate pages.  Then, the host filled up, and guests started
> reclaiming.  As memory pressure got worse, so did KSM's ability to find
> duplicates.
>    

Yup.  KSM needs to be backed up by ballooning, swap, and live migration.

> At the same time, I see what you're trying to do with this.  It really
> can be an alternative to ballooning if we do it right, since ballooning
> would probably evict similar pages.  Although it would only work in idle
> guests, what about a knob that the host can turn to just get the guest
> to start running reclaim?
>    

Isn't the knob in this proposal the balloon?  AFAICT, the idea here is 
to change how the guest reacts to being ballooned, but the trigger 
itself would not change.

My issue is that changing the type of object being preferentially 
reclaimed just changes the type of workload that would prematurely 
suffer from reclaim.  In this case, workloads that use a lot of unmapped 
pagecache would suffer.

btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 15:44                         ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 15:44 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/14/2010 06:33 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 16:01 +0300, Avi Kivity wrote:
>    
>> If we drop unmapped pagecache pages, we need to be sure they can be
>> backed by the host, and that depends on the amount of sharing.
>>      
> You also have to set up the host up properly, and continue to maintain
> it in a way that finds and eliminates duplicates.
>
> I saw some benchmarks where KSM was doing great, finding lots of
> duplicate pages.  Then, the host filled up, and guests started
> reclaiming.  As memory pressure got worse, so did KSM's ability to find
> duplicates.
>    

Yup.  KSM needs to be backed up by ballooning, swap, and live migration.

> At the same time, I see what you're trying to do with this.  It really
> can be an alternative to ballooning if we do it right, since ballooning
> would probably evict similar pages.  Although it would only work in idle
> guests, what about a knob that the host can turn to just get the guest
> to start running reclaim?
>    

Isn't the knob in this proposal the balloon?  AFAICT, the idea here is 
to change how the guest reacts to being ballooned, but the trigger 
itself would not change.

My issue is that changing the type of object being preferentially 
reclaimed just changes the type of workload that would prematurely 
suffer from reclaim.  In this case, workloads that use a lot of unmapped 
pagecache would suffer.

btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 15:44                         ` Avi Kivity
@ 2010-06-14 15:55                           ` Dave Hansen
  -1 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 15:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: balbir, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
> On 06/14/2010 06:33 PM, Dave Hansen wrote:
> > At the same time, I see what you're trying to do with this.  It really
> > can be an alternative to ballooning if we do it right, since ballooning
> > would probably evict similar pages.  Although it would only work in idle
> > guests, what about a knob that the host can turn to just get the guest
> > to start running reclaim?
> 
> Isn't the knob in this proposal the balloon?  AFAICT, the idea here is 
> to change how the guest reacts to being ballooned, but the trigger 
> itself would not change.

I think the patch was made on the following assumptions:
1. Guests will keep filling their memory with relatively worthless page
   cache that they don't really need.
2. When they do this, it hurts the overall system with no real gain for
   anyone.

In the case of a ballooned guest, they _won't_ keep filling memory.  The
balloon will prevent them.  So, I guess I was just going down the path
of considering if this would be useful without ballooning in place.  To
me, it's really hard to justify _with_ ballooning in place.

> My issue is that changing the type of object being preferentially 
> reclaimed just changes the type of workload that would prematurely 
> suffer from reclaim.  In this case, workloads that use a lot of unmapped 
> pagecache would suffer.
> 
> btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?

Those tell you how to balance going after the different classes of
things that we can reclaim.

Again, this is useless when ballooning is being used.  But, I'm thinking
of a more general mechanism to force the system to both have MemFree
_and_ be acting as if it is under memory pressure.

Balbir, can you elaborate a bit on why you would need these patches on a
guest that is being ballooned?

-- Dave


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 15:55                           ` Dave Hansen
  0 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 15:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: balbir, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
> On 06/14/2010 06:33 PM, Dave Hansen wrote:
> > At the same time, I see what you're trying to do with this.  It really
> > can be an alternative to ballooning if we do it right, since ballooning
> > would probably evict similar pages.  Although it would only work in idle
> > guests, what about a knob that the host can turn to just get the guest
> > to start running reclaim?
> 
> Isn't the knob in this proposal the balloon?  AFAICT, the idea here is 
> to change how the guest reacts to being ballooned, but the trigger 
> itself would not change.

I think the patch was made on the following assumptions:
1. Guests will keep filling their memory with relatively worthless page
   cache that they don't really need.
2. When they do this, it hurts the overall system with no real gain for
   anyone.

In the case of a ballooned guest, they _won't_ keep filling memory.  The
balloon will prevent them.  So, I guess I was just going down the path
of considering if this would be useful without ballooning in place.  To
me, it's really hard to justify _with_ ballooning in place.

> My issue is that changing the type of object being preferentially 
> reclaimed just changes the type of workload that would prematurely 
> suffer from reclaim.  In this case, workloads that use a lot of unmapped 
> pagecache would suffer.
> 
> btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?

Those tell you how to balance going after the different classes of
things that we can reclaim.

Again, this is useless when ballooning is being used.  But, I'm thinking
of a more general mechanism to force the system to both have MemFree
_and_ be acting as if it is under memory pressure.

Balbir, can you elaborate a bit on why you would need these patches on a
guest that is being ballooned?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 15:55                           ` Dave Hansen
@ 2010-06-14 16:34                             ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 16:34 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/14/2010 06:55 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
>    
>> On 06/14/2010 06:33 PM, Dave Hansen wrote:
>>      
>>> At the same time, I see what you're trying to do with this.  It really
>>> can be an alternative to ballooning if we do it right, since ballooning
>>> would probably evict similar pages.  Although it would only work in idle
>>> guests, what about a knob that the host can turn to just get the guest
>>> to start running reclaim?
>>>        
>> Isn't the knob in this proposal the balloon?  AFAICT, the idea here is
>> to change how the guest reacts to being ballooned, but the trigger
>> itself would not change.
>>      
> I think the patch was made on the following assumptions:
> 1. Guests will keep filling their memory with relatively worthless page
>     cache that they don't really need.
> 2. When they do this, it hurts the overall system with no real gain for
>     anyone.
>
> In the case of a ballooned guest, they _won't_ keep filling memory.  The
> balloon will prevent them.  So, I guess I was just going down the path
> of considering if this would be useful without ballooning in place.  To
> me, it's really hard to justify _with_ ballooning in place.
>    

There are two decisions that need to be made:

- how much memory a guest should be given
- given some guest memory, what's the best use for it

The first question can perhaps be answered by looking at guest I/O rates 
and giving more memory to more active guests.  The second question is 
hard, but not any different than running non-virtualized - except if we 
can detect sharing or duplication.  In this case, dropping a duplicated 
page is worthwhile, while dropping a shared page provides no benefit.

How the patch helps answer either question, I'm not sure.  I don't think 
preferential dropping of unmapped page cache is the answer.

>> My issue is that changing the type of object being preferentially
>> reclaimed just changes the type of workload that would prematurely
>> suffer from reclaim.  In this case, workloads that use a lot of unmapped
>> pagecache would suffer.
>>
>> btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?
>>      
> Those tell you how to balance going after the different classes of
> things that we can reclaim.
>
> Again, this is useless when ballooning is being used.  But, I'm thinking
> of a more general mechanism to force the system to both have MemFree
> _and_ be acting as if it is under memory pressure.
>    

If there is no memory pressure on the host, there is no reason for the 
guest to pretend it is under pressure.  If there is memory pressure on 
the host, it should share the pain among its guests by applying the 
balloon.  So I don't think voluntarily dropping cache is a good direction.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 16:34                             ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-14 16:34 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/14/2010 06:55 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
>    
>> On 06/14/2010 06:33 PM, Dave Hansen wrote:
>>      
>>> At the same time, I see what you're trying to do with this.  It really
>>> can be an alternative to ballooning if we do it right, since ballooning
>>> would probably evict similar pages.  Although it would only work in idle
>>> guests, what about a knob that the host can turn to just get the guest
>>> to start running reclaim?
>>>        
>> Isn't the knob in this proposal the balloon?  AFAICT, the idea here is
>> to change how the guest reacts to being ballooned, but the trigger
>> itself would not change.
>>      
> I think the patch was made on the following assumptions:
> 1. Guests will keep filling their memory with relatively worthless page
>     cache that they don't really need.
> 2. When they do this, it hurts the overall system with no real gain for
>     anyone.
>
> In the case of a ballooned guest, they _won't_ keep filling memory.  The
> balloon will prevent them.  So, I guess I was just going down the path
> of considering if this would be useful without ballooning in place.  To
> me, it's really hard to justify _with_ ballooning in place.
>    

There are two decisions that need to be made:

- how much memory a guest should be given
- given some guest memory, what's the best use for it

The first question can perhaps be answered by looking at guest I/O rates 
and giving more memory to more active guests.  The second question is 
hard, but not any different than running non-virtualized - except if we 
can detect sharing or duplication.  In this case, dropping a duplicated 
page is worthwhile, while dropping a shared page provides no benefit.

How the patch helps answer either question, I'm not sure.  I don't think 
preferential dropping of unmapped page cache is the answer.

>> My issue is that changing the type of object being preferentially
>> reclaimed just changes the type of workload that would prematurely
>> suffer from reclaim.  In this case, workloads that use a lot of unmapped
>> pagecache would suffer.
>>
>> btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?
>>      
> Those tell you how to balance going after the different classes of
> things that we can reclaim.
>
> Again, this is useless when ballooning is being used.  But, I'm thinking
> of a more general mechanism to force the system to both have MemFree
> _and_ be acting as if it is under memory pressure.
>    

If there is no memory pressure on the host, there is no reason for the 
guest to pretend it is under pressure.  If there is memory pressure on 
the host, it should share the pain among its guests by applying the 
balloon.  So I don't think voluntarily dropping cache is a good direction.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 15:12                 ` Dave Hansen
@ 2010-06-14 16:58                   ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 16:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

* Dave Hansen <dave@linux.vnet.ibm.com> [2010-06-14 08:12:56]:

> On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
> > 1. A slab page will not be freed until the entire page is free (all
> > slabs have been kfree'd so to speak). Normal reclaim will definitely
> > free this page, but a lot of it depends on how frequently we are
> > scanning the LRU list and when this page got added.
> 
> You don't have to be freeing entire slab pages for the reclaim to have
> been useful.  You could just be making space so that _future_
> allocations fill in the slab holes you just created.  You may not be
> freeing pages, but you're reducing future system pressure.
> 
> If unmapped page cache is the easiest thing to evict, then it should be
> the first thing that goes when a balloon request comes in, which is the
> case this patch is trying to handle.  If it isn't the easiest thing to
> evict, then we _shouldn't_ evict it.
>

Like I said earlier, a lot of that works correctly as you said, but it
is also an idealization. If you've got duplicate pages and you know
that they are duplicated and can be retrieved at a lower cost, why
wouldn't we go after them first?

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 16:58                   ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 16:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

* Dave Hansen <dave@linux.vnet.ibm.com> [2010-06-14 08:12:56]:

> On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
> > 1. A slab page will not be freed until the entire page is free (all
> > slabs have been kfree'd so to speak). Normal reclaim will definitely
> > free this page, but a lot of it depends on how frequently we are
> > scanning the LRU list and when this page got added.
> 
> You don't have to be freeing entire slab pages for the reclaim to have
> been useful.  You could just be making space so that _future_
> allocations fill in the slab holes you just created.  You may not be
> freeing pages, but you're reducing future system pressure.
> 
> If unmapped page cache is the easiest thing to evict, then it should be
> the first thing that goes when a balloon request comes in, which is the
> case this patch is trying to handle.  If it isn't the easiest thing to
> evict, then we _shouldn't_ evict it.
>

Like I said earlier, a lot of that works correctly as you said, but it
is also an idealization. If you've got duplicate pages and you know
that they are duplicated and can be retrieved at a lower cost, why
wouldn't we go after them first?

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 16:58                   ` Balbir Singh
@ 2010-06-14 17:09                     ` Dave Hansen
  -1 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 17:09 UTC (permalink / raw)
  To: balbir; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
> If you've got duplicate pages and you know
> that they are duplicated and can be retrieved at a lower cost, why
> wouldn't we go after them first?

I agree with this in theory.  But, the guest lacks the information about
what is truly duplicated and what the costs are for itself and/or the
host to recreate it.  "Unmapped page cache" may be the best proxy that
we have at the moment for "easy to recreate", but I think it's still too
poor a match to make these patches useful.

-- Dave


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 17:09                     ` Dave Hansen
  0 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 17:09 UTC (permalink / raw)
  To: balbir; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
> If you've got duplicate pages and you know
> that they are duplicated and can be retrieved at a lower cost, why
> wouldn't we go after them first?

I agree with this in theory.  But, the guest lacks the information about
what is truly duplicated and what the costs are for itself and/or the
host to recreate it.  "Unmapped page cache" may be the best proxy that
we have at the moment for "easy to recreate", but I think it's still too
poor a match to make these patches useful.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 17:09                     ` Dave Hansen
@ 2010-06-14 17:16                       ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 17:16 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

* Dave Hansen <dave@linux.vnet.ibm.com> [2010-06-14 10:09:31]:

> On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
> > If you've got duplicate pages and you know
> > that they are duplicated and can be retrieved at a lower cost, why
> > wouldn't we go after them first?
> 
> I agree with this in theory.  But, the guest lacks the information about
> what is truly duplicated and what the costs are for itself and/or the
> host to recreate it.  "Unmapped page cache" may be the best proxy that
> we have at the moment for "easy to recreate", but I think it's still too
> poor a match to make these patches useful.
>

That is why the policy (in the next set) will come from the host. As
to whether the data is truly duplicated, my experiments show up to 60%
of the page cache is duplicated. The first patch today is again
enabled by the host. Both of them are expected to be useful in the
cache != none case.

The data I have shows more details including the performance and
overhead.

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 17:16                       ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 17:16 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Avi Kivity, kvm, linux-mm, linux-kernel

* Dave Hansen <dave@linux.vnet.ibm.com> [2010-06-14 10:09:31]:

> On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
> > If you've got duplicate pages and you know
> > that they are duplicated and can be retrieved at a lower cost, why
> > wouldn't we go after them first?
> 
> I agree with this in theory.  But, the guest lacks the information about
> what is truly duplicated and what the costs are for itself and/or the
> host to recreate it.  "Unmapped page cache" may be the best proxy that
> we have at the moment for "easy to recreate", but I think it's still too
> poor a match to make these patches useful.
>

That is why the policy (in the next set) will come from the host. As
to whether the data is truly duplicated, my experiments show up to 60%
of the page cache is duplicated. The first patch today is again
enabled by the host. Both of them are expected to be useful in the
cache != none case.

The data I have shows more details including the performance and
overhead.

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 15:34                   ` Avi Kivity
@ 2010-06-14 17:40                     ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 17:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-14 18:34:58]:

> On 06/14/2010 06:12 PM, Dave Hansen wrote:
> >On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
> >>1. A slab page will not be freed until the entire page is free (all
> >>slabs have been kfree'd so to speak). Normal reclaim will definitely
> >>free this page, but a lot of it depends on how frequently we are
> >>scanning the LRU list and when this page got added.
> >You don't have to be freeing entire slab pages for the reclaim to have
> >been useful.  You could just be making space so that _future_
> >allocations fill in the slab holes you just created.  You may not be
> >freeing pages, but you're reducing future system pressure.
> 
> Depends.  If you've evicted something that will be referenced soon,
> you're increasing system pressure.
>

I don't think slab pages care about being referenced soon, they are
either allocated or freed. A page is just a storage unit for the data
structure. A new one can be allocated on demand.
 
 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 17:40                     ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 17:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-14 18:34:58]:

> On 06/14/2010 06:12 PM, Dave Hansen wrote:
> >On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
> >>1. A slab page will not be freed until the entire page is free (all
> >>slabs have been kfree'd so to speak). Normal reclaim will definitely
> >>free this page, but a lot of it depends on how frequently we are
> >>scanning the LRU list and when this page got added.
> >You don't have to be freeing entire slab pages for the reclaim to have
> >been useful.  You could just be making space so that _future_
> >allocations fill in the slab holes you just created.  You may not be
> >freeing pages, but you're reducing future system pressure.
> 
> Depends.  If you've evicted something that will be referenced soon,
> you're increasing system pressure.
>

I don't think slab pages care about being referenced soon, they are
either allocated or freed. A page is just a storage unit for the data
structure. A new one can be allocated on demand.
 
 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 16:34                             ` Avi Kivity
@ 2010-06-14 17:45                               ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 17:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-14 19:34:00]:

> On 06/14/2010 06:55 PM, Dave Hansen wrote:
> >On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
> >>On 06/14/2010 06:33 PM, Dave Hansen wrote:
> >>>At the same time, I see what you're trying to do with this.  It really
> >>>can be an alternative to ballooning if we do it right, since ballooning
> >>>would probably evict similar pages.  Although it would only work in idle
> >>>guests, what about a knob that the host can turn to just get the guest
> >>>to start running reclaim?
> >>Isn't the knob in this proposal the balloon?  AFAICT, the idea here is
> >>to change how the guest reacts to being ballooned, but the trigger
> >>itself would not change.
> >I think the patch was made on the following assumptions:
> >1. Guests will keep filling their memory with relatively worthless page
> >    cache that they don't really need.
> >2. When they do this, it hurts the overall system with no real gain for
> >    anyone.
> >
> >In the case of a ballooned guest, they _won't_ keep filling memory.  The
> >balloon will prevent them.  So, I guess I was just going down the path
> >of considering if this would be useful without ballooning in place.  To
> >me, it's really hard to justify _with_ ballooning in place.
> 
> There are two decisions that need to be made:
> 
> - how much memory a guest should be given
> - given some guest memory, what's the best use for it
> 
> The first question can perhaps be answered by looking at guest I/O
> rates and giving more memory to more active guests.  The second
> question is hard, but not any different than running non-virtualized
> - except if we can detect sharing or duplication.  In this case,
> dropping a duplicated page is worthwhile, while dropping a shared
> page provides no benefit.

I think there is another way of looking at it, give some free memory

1. Can the guest run more applications or run faster
2. Can the host potentially get this memory via ballooning or some
other means to start newer guest instances

I think the answer to 1 and 2 is yes.

> 
> How the patch helps answer either question, I'm not sure.  I don't
> think preferential dropping of unmapped page cache is the answer.
>

Preferential dropping as selected by the host, that knows about the
setup and if there is duplication involved. While we use the term
preferential dropping, remember it is still via LRU and we don't
always succeed. It is a best effort (if you can and the unmapped pages
are not highly referenced) scenario.
 
> >>My issue is that changing the type of object being preferentially
> >>reclaimed just changes the type of workload that would prematurely
> >>suffer from reclaim.  In this case, workloads that use a lot of unmapped
> >>pagecache would suffer.
> >>
> >>btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?
> >Those tell you how to balance going after the different classes of
> >things that we can reclaim.
> >
> >Again, this is useless when ballooning is being used.  But, I'm thinking
> >of a more general mechanism to force the system to both have MemFree
> >_and_ be acting as if it is under memory pressure.
> 
> If there is no memory pressure on the host, there is no reason for
> the guest to pretend it is under pressure.  If there is memory
> pressure on the host, it should share the pain among its guests by
> applying the balloon.  So I don't think voluntarily dropping cache
> is a good direction.
>

There are two situations

1. Voluntarily drop cache, if it was setup to do so (the host knows
that it caches that information anyway)
2. Drop the cache on either a special balloon option, again the host
knows it caches that very same information, so it prefers to free that
up first. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 17:45                               ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-14 17:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-14 19:34:00]:

> On 06/14/2010 06:55 PM, Dave Hansen wrote:
> >On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
> >>On 06/14/2010 06:33 PM, Dave Hansen wrote:
> >>>At the same time, I see what you're trying to do with this.  It really
> >>>can be an alternative to ballooning if we do it right, since ballooning
> >>>would probably evict similar pages.  Although it would only work in idle
> >>>guests, what about a knob that the host can turn to just get the guest
> >>>to start running reclaim?
> >>Isn't the knob in this proposal the balloon?  AFAICT, the idea here is
> >>to change how the guest reacts to being ballooned, but the trigger
> >>itself would not change.
> >I think the patch was made on the following assumptions:
> >1. Guests will keep filling their memory with relatively worthless page
> >    cache that they don't really need.
> >2. When they do this, it hurts the overall system with no real gain for
> >    anyone.
> >
> >In the case of a ballooned guest, they _won't_ keep filling memory.  The
> >balloon will prevent them.  So, I guess I was just going down the path
> >of considering if this would be useful without ballooning in place.  To
> >me, it's really hard to justify _with_ ballooning in place.
> 
> There are two decisions that need to be made:
> 
> - how much memory a guest should be given
> - given some guest memory, what's the best use for it
> 
> The first question can perhaps be answered by looking at guest I/O
> rates and giving more memory to more active guests.  The second
> question is hard, but not any different than running non-virtualized
> - except if we can detect sharing or duplication.  In this case,
> dropping a duplicated page is worthwhile, while dropping a shared
> page provides no benefit.

I think there is another way of looking at it, give some free memory

1. Can the guest run more applications or run faster
2. Can the host potentially get this memory via ballooning or some
other means to start newer guest instances

I think the answer to 1 and 2 is yes.

> 
> How the patch helps answer either question, I'm not sure.  I don't
> think preferential dropping of unmapped page cache is the answer.
>

Preferential dropping as selected by the host, that knows about the
setup and if there is duplication involved. While we use the term
preferential dropping, remember it is still via LRU and we don't
always succeed. It is a best effort (if you can and the unmapped pages
are not highly referenced) scenario.
 
> >>My issue is that changing the type of object being preferentially
> >>reclaimed just changes the type of workload that would prematurely
> >>suffer from reclaim.  In this case, workloads that use a lot of unmapped
> >>pagecache would suffer.
> >>
> >>btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?
> >Those tell you how to balance going after the different classes of
> >things that we can reclaim.
> >
> >Again, this is useless when ballooning is being used.  But, I'm thinking
> >of a more general mechanism to force the system to both have MemFree
> >_and_ be acting as if it is under memory pressure.
> 
> If there is no memory pressure on the host, there is no reason for
> the guest to pretend it is under pressure.  If there is memory
> pressure on the host, it should share the pain among its guests by
> applying the balloon.  So I don't think voluntarily dropping cache
> is a good direction.
>

There are two situations

1. Voluntarily drop cache, if it was setup to do so (the host knows
that it caches that information anyway)
2. Drop the cache on either a special balloon option, again the host
knows it caches that very same information, so it prefers to free that
up first. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 16:34                             ` Avi Kivity
@ 2010-06-14 17:58                               ` Dave Hansen
  -1 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 17:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: balbir, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
> > Again, this is useless when ballooning is being used.  But, I'm thinking
> > of a more general mechanism to force the system to both have MemFree
> > _and_ be acting as if it is under memory pressure.
> >    
> 
> If there is no memory pressure on the host, there is no reason for the 
> guest to pretend it is under pressure.

I can think of quite a few places where this would be beneficial.

Ballooning is dangerous.  I've OOMed quite a few guests by
over-ballooning them.  Anything that's voluntary like this is safer than
things imposed by the host, although you do trade of effectiveness.

If all the guests do this, then it leaves that much more free memory on
the host, which can be used flexibly for extra host page cache, new
guests, etc...  A system in this state where everyone is proactively
keeping their footprints down is more likely to be able to handle load
spikes.  Reclaim is an expensive, costly activity, and this ensures that
we don't have to do that when we're busy doing other things like
handling load spikes.  This was one of the concepts behind CMM2: reduce
the overhead during peak periods.

It's also handy for planning.  Guests exhibiting this behavior will
_act_ as if they're under pressure.  That's a good thing to approximate
how a guest will act when it _is_ under pressure.

> If there is memory pressure on 
> the host, it should share the pain among its guests by applying the 
> balloon.  So I don't think voluntarily dropping cache is a good direction.

I think we're trying to consider things slightly outside of ballooning
at this point.  If ballooning was the end-all solution, I'm fairly sure
Balbir wouldn't be looking at this stuff.  Just trying to keep options
open. :)

-- Dave


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-14 17:58                               ` Dave Hansen
  0 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-14 17:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: balbir, kvm, linux-mm, linux-kernel

On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
> > Again, this is useless when ballooning is being used.  But, I'm thinking
> > of a more general mechanism to force the system to both have MemFree
> > _and_ be acting as if it is under memory pressure.
> >    
> 
> If there is no memory pressure on the host, there is no reason for the 
> guest to pretend it is under pressure.

I can think of quite a few places where this would be beneficial.

Ballooning is dangerous.  I've OOMed quite a few guests by
over-ballooning them.  Anything that's voluntary like this is safer than
things imposed by the host, although you do trade of effectiveness.

If all the guests do this, then it leaves that much more free memory on
the host, which can be used flexibly for extra host page cache, new
guests, etc...  A system in this state where everyone is proactively
keeping their footprints down is more likely to be able to handle load
spikes.  Reclaim is an expensive, costly activity, and this ensures that
we don't have to do that when we're busy doing other things like
handling load spikes.  This was one of the concepts behind CMM2: reduce
the overhead during peak periods.

It's also handy for planning.  Guests exhibiting this behavior will
_act_ as if they're under pressure.  That's a good thing to approximate
how a guest will act when it _is_ under pressure.

> If there is memory pressure on 
> the host, it should share the pain among its guests by applying the 
> balloon.  So I don't think voluntarily dropping cache is a good direction.

I think we're trying to consider things slightly outside of ballooning
at this point.  If ballooning was the end-all solution, I'm fairly sure
Balbir wouldn't be looking at this stuff.  Just trying to keep options
open. :)

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 17:45                               ` Balbir Singh
@ 2010-06-15  6:58                                 ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  6:58 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 08:45 PM, Balbir Singh wrote:
>
>> There are two decisions that need to be made:
>>
>> - how much memory a guest should be given
>> - given some guest memory, what's the best use for it
>>
>> The first question can perhaps be answered by looking at guest I/O
>> rates and giving more memory to more active guests.  The second
>> question is hard, but not any different than running non-virtualized
>> - except if we can detect sharing or duplication.  In this case,
>> dropping a duplicated page is worthwhile, while dropping a shared
>> page provides no benefit.
>>      
> I think there is another way of looking at it, give some free memory
>
> 1. Can the guest run more applications or run faster
>    

That's my second question.  How to best use this memory.  More 
applications == drop the page from cache, faster == keep page in cache.

All we need is to select the right page to drop.

> 2. Can the host potentially get this memory via ballooning or some
> other means to start newer guest instances
>    

Well, we already have ballooning.  The question is can we improve the 
eviction algorithm.

> I think the answer to 1 and 2 is yes.
>
>    
>> How the patch helps answer either question, I'm not sure.  I don't
>> think preferential dropping of unmapped page cache is the answer.
>>
>>      
> Preferential dropping as selected by the host, that knows about the
> setup and if there is duplication involved. While we use the term
> preferential dropping, remember it is still via LRU and we don't
> always succeed. It is a best effort (if you can and the unmapped pages
> are not highly referenced) scenario.
>    

How can the host tell if there is duplication?  It may know it has some 
pagecache, but it has no idea whether or to what extent guest pagecache 
duplicates host pagecache.

>>> Those tell you how to balance going after the different classes of
>>> things that we can reclaim.
>>>
>>> Again, this is useless when ballooning is being used.  But, I'm thinking
>>> of a more general mechanism to force the system to both have MemFree
>>> _and_ be acting as if it is under memory pressure.
>>>        
>> If there is no memory pressure on the host, there is no reason for
>> the guest to pretend it is under pressure.  If there is memory
>> pressure on the host, it should share the pain among its guests by
>> applying the balloon.  So I don't think voluntarily dropping cache
>> is a good direction.
>>
>>      
> There are two situations
>
> 1. Voluntarily drop cache, if it was setup to do so (the host knows
> that it caches that information anyway)
>    

It doesn't, really.  The host only has aggregate information about 
itself, and no information about the guest.

Dropping duplicate pages would be good if we could identify them.  Even 
then, it's better to drop the page from the host, not the guest, unless 
we know the same page is cached by multiple guests.

But why would the guest voluntarily drop the cache?  If there is no 
memory pressure, dropping caches increases cpu overhead and latency even 
if the data is still cached on the host.

> 2. Drop the cache on either a special balloon option, again the host
> knows it caches that very same information, so it prefers to free that
> up first.
>    

Dropping in response to pressure is good.  I'm just not convinced the 
patch helps in selecting the correct page to drop.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15  6:58                                 ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  6:58 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 08:45 PM, Balbir Singh wrote:
>
>> There are two decisions that need to be made:
>>
>> - how much memory a guest should be given
>> - given some guest memory, what's the best use for it
>>
>> The first question can perhaps be answered by looking at guest I/O
>> rates and giving more memory to more active guests.  The second
>> question is hard, but not any different than running non-virtualized
>> - except if we can detect sharing or duplication.  In this case,
>> dropping a duplicated page is worthwhile, while dropping a shared
>> page provides no benefit.
>>      
> I think there is another way of looking at it, give some free memory
>
> 1. Can the guest run more applications or run faster
>    

That's my second question.  How to best use this memory.  More 
applications == drop the page from cache, faster == keep page in cache.

All we need is to select the right page to drop.

> 2. Can the host potentially get this memory via ballooning or some
> other means to start newer guest instances
>    

Well, we already have ballooning.  The question is can we improve the 
eviction algorithm.

> I think the answer to 1 and 2 is yes.
>
>    
>> How the patch helps answer either question, I'm not sure.  I don't
>> think preferential dropping of unmapped page cache is the answer.
>>
>>      
> Preferential dropping as selected by the host, that knows about the
> setup and if there is duplication involved. While we use the term
> preferential dropping, remember it is still via LRU and we don't
> always succeed. It is a best effort (if you can and the unmapped pages
> are not highly referenced) scenario.
>    

How can the host tell if there is duplication?  It may know it has some 
pagecache, but it has no idea whether or to what extent guest pagecache 
duplicates host pagecache.

>>> Those tell you how to balance going after the different classes of
>>> things that we can reclaim.
>>>
>>> Again, this is useless when ballooning is being used.  But, I'm thinking
>>> of a more general mechanism to force the system to both have MemFree
>>> _and_ be acting as if it is under memory pressure.
>>>        
>> If there is no memory pressure on the host, there is no reason for
>> the guest to pretend it is under pressure.  If there is memory
>> pressure on the host, it should share the pain among its guests by
>> applying the balloon.  So I don't think voluntarily dropping cache
>> is a good direction.
>>
>>      
> There are two situations
>
> 1. Voluntarily drop cache, if it was setup to do so (the host knows
> that it caches that information anyway)
>    

It doesn't, really.  The host only has aggregate information about 
itself, and no information about the guest.

Dropping duplicate pages would be good if we could identify them.  Even 
then, it's better to drop the page from the host, not the guest, unless 
we know the same page is cached by multiple guests.

But why would the guest voluntarily drop the cache?  If there is no 
memory pressure, dropping caches increases cpu overhead and latency even 
if the data is still cached on the host.

> 2. Drop the cache on either a special balloon option, again the host
> knows it caches that very same information, so it prefers to free that
> up first.
>    

Dropping in response to pressure is good.  I'm just not convinced the 
patch helps in selecting the correct page to drop.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 17:58                               ` Dave Hansen
@ 2010-06-15  7:07                                 ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  7:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/14/2010 08:58 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
>    
>>> Again, this is useless when ballooning is being used.  But, I'm thinking
>>> of a more general mechanism to force the system to both have MemFree
>>> _and_ be acting as if it is under memory pressure.
>>>
>>>        
>> If there is no memory pressure on the host, there is no reason for the
>> guest to pretend it is under pressure.
>>      
> I can think of quite a few places where this would be beneficial.
>
> Ballooning is dangerous.  I've OOMed quite a few guests by
> over-ballooning them.  Anything that's voluntary like this is safer than
> things imposed by the host, although you do trade of effectiveness.
>    

That's a bug that needs to be fixed.  Eventually the host will come 
under pressure and will balloon the guest.  If that kills the guest, the 
ballooning is not effective as a host memory management technique.

Trying to defer ballooning by voluntarily dropping cache is simply 
trying to defer being bitten by the bug.

> If all the guests do this, then it leaves that much more free memory on
> the host, which can be used flexibly for extra host page cache, new
> guests, etc...

If the host detects lots of pagecache misses it can balloon guests 
down.  If pagecache is quiet, why change anything?

If the host wants to start new guests, it can balloon guests down.  If 
no new guests are wanted, why change anything?

etc...

> A system in this state where everyone is proactively
> keeping their footprints down is more likely to be able to handle load
> spikes.

That is true.  But from the guest's point of view, voluntarily giving up 
memory means dropping the guest's cushion vs load spikes.

> Reclaim is an expensive, costly activity, and this ensures that
> we don't have to do that when we're busy doing other things like
> handling load spikes.

The guest doesn't want to reclaim memory from the host when it's under a 
load spike either.

> This was one of the concepts behind CMM2: reduce
> the overhead during peak periods.
>    

Ah, but CMM2 actually reduced work being done by sharing information 
between guest and host.

> It's also handy for planning.  Guests exhibiting this behavior will
> _act_ as if they're under pressure.  That's a good thing to approximate
> how a guest will act when it _is_ under pressure.
>    

If a guest acts as if it is under pressure, then it will be slower and 
consume more cpu.  Bad for both guest and host.

>> If there is memory pressure on
>> the host, it should share the pain among its guests by applying the
>> balloon.  So I don't think voluntarily dropping cache is a good direction.
>>      
> I think we're trying to consider things slightly outside of ballooning
> at this point.  If ballooning was the end-all solution, I'm fairly sure
> Balbir wouldn't be looking at this stuff.  Just trying to keep options
> open. :)
>    

I see this as an extension to ballooning - perhaps I'm missing the big 
picture.  I would dearly love to have CMM2 where decisions are made on a 
per-page basis instead of using heuristics.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15  7:07                                 ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  7:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/14/2010 08:58 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
>    
>>> Again, this is useless when ballooning is being used.  But, I'm thinking
>>> of a more general mechanism to force the system to both have MemFree
>>> _and_ be acting as if it is under memory pressure.
>>>
>>>        
>> If there is no memory pressure on the host, there is no reason for the
>> guest to pretend it is under pressure.
>>      
> I can think of quite a few places where this would be beneficial.
>
> Ballooning is dangerous.  I've OOMed quite a few guests by
> over-ballooning them.  Anything that's voluntary like this is safer than
> things imposed by the host, although you do trade of effectiveness.
>    

That's a bug that needs to be fixed.  Eventually the host will come 
under pressure and will balloon the guest.  If that kills the guest, the 
ballooning is not effective as a host memory management technique.

Trying to defer ballooning by voluntarily dropping cache is simply 
trying to defer being bitten by the bug.

> If all the guests do this, then it leaves that much more free memory on
> the host, which can be used flexibly for extra host page cache, new
> guests, etc...

If the host detects lots of pagecache misses it can balloon guests 
down.  If pagecache is quiet, why change anything?

If the host wants to start new guests, it can balloon guests down.  If 
no new guests are wanted, why change anything?

etc...

> A system in this state where everyone is proactively
> keeping their footprints down is more likely to be able to handle load
> spikes.

That is true.  But from the guest's point of view, voluntarily giving up 
memory means dropping the guest's cushion vs load spikes.

> Reclaim is an expensive, costly activity, and this ensures that
> we don't have to do that when we're busy doing other things like
> handling load spikes.

The guest doesn't want to reclaim memory from the host when it's under a 
load spike either.

> This was one of the concepts behind CMM2: reduce
> the overhead during peak periods.
>    

Ah, but CMM2 actually reduced work being done by sharing information 
between guest and host.

> It's also handy for planning.  Guests exhibiting this behavior will
> _act_ as if they're under pressure.  That's a good thing to approximate
> how a guest will act when it _is_ under pressure.
>    

If a guest acts as if it is under pressure, then it will be slower and 
consume more cpu.  Bad for both guest and host.

>> If there is memory pressure on
>> the host, it should share the pain among its guests by applying the
>> balloon.  So I don't think voluntarily dropping cache is a good direction.
>>      
> I think we're trying to consider things slightly outside of ballooning
> at this point.  If ballooning was the end-all solution, I'm fairly sure
> Balbir wouldn't be looking at this stuff.  Just trying to keep options
> open. :)
>    

I see this as an extension to ballooning - perhaps I'm missing the big 
picture.  I would dearly love to have CMM2 where decisions are made on a 
per-page basis instead of using heuristics.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 17:40                     ` Balbir Singh
@ 2010-06-15  7:11                       ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  7:11 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 08:40 PM, Balbir Singh wrote:
> * Avi Kivity<avi@redhat.com>  [2010-06-14 18:34:58]:
>
>    
>> On 06/14/2010 06:12 PM, Dave Hansen wrote:
>>      
>>> On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
>>>        
>>>> 1. A slab page will not be freed until the entire page is free (all
>>>> slabs have been kfree'd so to speak). Normal reclaim will definitely
>>>> free this page, but a lot of it depends on how frequently we are
>>>> scanning the LRU list and when this page got added.
>>>>          
>>> You don't have to be freeing entire slab pages for the reclaim to have
>>> been useful.  You could just be making space so that _future_
>>> allocations fill in the slab holes you just created.  You may not be
>>> freeing pages, but you're reducing future system pressure.
>>>        
>> Depends.  If you've evicted something that will be referenced soon,
>> you're increasing system pressure.
>>
>>      
> I don't think slab pages care about being referenced soon, they are
> either allocated or freed. A page is just a storage unit for the data
> structure. A new one can be allocated on demand.
>    

If we're talking just about slab pages, I agree.  If we're applying 
pressure on the shrinkers, then you are removing live objects which can 
be costly to reinstantiate.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15  7:11                       ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  7:11 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 08:40 PM, Balbir Singh wrote:
> * Avi Kivity<avi@redhat.com>  [2010-06-14 18:34:58]:
>
>    
>> On 06/14/2010 06:12 PM, Dave Hansen wrote:
>>      
>>> On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
>>>        
>>>> 1. A slab page will not be freed until the entire page is free (all
>>>> slabs have been kfree'd so to speak). Normal reclaim will definitely
>>>> free this page, but a lot of it depends on how frequently we are
>>>> scanning the LRU list and when this page got added.
>>>>          
>>> You don't have to be freeing entire slab pages for the reclaim to have
>>> been useful.  You could just be making space so that _future_
>>> allocations fill in the slab holes you just created.  You may not be
>>> freeing pages, but you're reducing future system pressure.
>>>        
>> Depends.  If you've evicted something that will be referenced soon,
>> you're increasing system pressure.
>>
>>      
> I don't think slab pages care about being referenced soon, they are
> either allocated or freed. A page is just a storage unit for the data
> structure. A new one can be allocated on demand.
>    

If we're talking just about slab pages, I agree.  If we're applying 
pressure on the shrinkers, then you are removing live objects which can 
be costly to reinstantiate.


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-14 17:16                       ` Balbir Singh
@ 2010-06-15  7:12                         ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  7:12 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 08:16 PM, Balbir Singh wrote:
> * Dave Hansen<dave@linux.vnet.ibm.com>  [2010-06-14 10:09:31]:
>
>    
>> On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
>>      
>>> If you've got duplicate pages and you know
>>> that they are duplicated and can be retrieved at a lower cost, why
>>> wouldn't we go after them first?
>>>        
>> I agree with this in theory.  But, the guest lacks the information about
>> what is truly duplicated and what the costs are for itself and/or the
>> host to recreate it.  "Unmapped page cache" may be the best proxy that
>> we have at the moment for "easy to recreate", but I think it's still too
>> poor a match to make these patches useful.
>>
>>      
> That is why the policy (in the next set) will come from the host. As
> to whether the data is truly duplicated, my experiments show up to 60%
> of the page cache is duplicated.

Isn't that incredibly workload dependent?

We can't expect the host admin to know whether duplication will occur or 
not.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15  7:12                         ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  7:12 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/14/2010 08:16 PM, Balbir Singh wrote:
> * Dave Hansen<dave@linux.vnet.ibm.com>  [2010-06-14 10:09:31]:
>
>    
>> On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
>>      
>>> If you've got duplicate pages and you know
>>> that they are duplicated and can be retrieved at a lower cost, why
>>> wouldn't we go after them first?
>>>        
>> I agree with this in theory.  But, the guest lacks the information about
>> what is truly duplicated and what the costs are for itself and/or the
>> host to recreate it.  "Unmapped page cache" may be the best proxy that
>> we have at the moment for "easy to recreate", but I think it's still too
>> poor a match to make these patches useful.
>>
>>      
> That is why the policy (in the next set) will come from the host. As
> to whether the data is truly duplicated, my experiments show up to 60%
> of the page cache is duplicated.

Isn't that incredibly workload dependent?

We can't expect the host admin to know whether duplication will occur or 
not.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-15  6:58                                 ` Avi Kivity
@ 2010-06-15  7:49                                   ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-15  7:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-15 09:58:33]:

> On 06/14/2010 08:45 PM, Balbir Singh wrote:
> >
> >>There are two decisions that need to be made:
> >>
> >>- how much memory a guest should be given
> >>- given some guest memory, what's the best use for it
> >>
> >>The first question can perhaps be answered by looking at guest I/O
> >>rates and giving more memory to more active guests.  The second
> >>question is hard, but not any different than running non-virtualized
> >>- except if we can detect sharing or duplication.  In this case,
> >>dropping a duplicated page is worthwhile, while dropping a shared
> >>page provides no benefit.
> >I think there is another way of looking at it, give some free memory
> >
> >1. Can the guest run more applications or run faster
> 
> That's my second question.  How to best use this memory.  More
> applications == drop the page from cache, faster == keep page in
> cache.
> 
> All we need is to select the right page to drop.
>

Do we need to drop to the granularity of the page to drop? I think
figuring out the class of pages and making sure that we don't write
our own reclaim logic, but work with what we have to identify the
class of pages is a good start. 
 
> >2. Can the host potentially get this memory via ballooning or some
> >other means to start newer guest instances
> 
> Well, we already have ballooning.  The question is can we improve
> the eviction algorithm.
> 
> >I think the answer to 1 and 2 is yes.
> >
> >>How the patch helps answer either question, I'm not sure.  I don't
> >>think preferential dropping of unmapped page cache is the answer.
> >>
> >Preferential dropping as selected by the host, that knows about the
> >setup and if there is duplication involved. While we use the term
> >preferential dropping, remember it is still via LRU and we don't
> >always succeed. It is a best effort (if you can and the unmapped pages
> >are not highly referenced) scenario.
> 
> How can the host tell if there is duplication?  It may know it has
> some pagecache, but it has no idea whether or to what extent guest
> pagecache duplicates host pagecache.
> 

Well it is possible in host user space, I for example use memory
cgroup and through the stats I have a good idea of how much is duplicated.
I am ofcourse making an assumption with my setup of the cached mode,
that the data in the guest page cache and page cache in the cgroup
will be duplicated to a large extent. I did some trivial experiments
like drop the data from the guest and look at the cost of bringing it
in and dropping the data from both guest and host and look at the
cost. I could see a difference.

Unfortunately, I did not save the data, so I'll need to redo the
experiment.

> >>>Those tell you how to balance going after the different classes of
> >>>things that we can reclaim.
> >>>
> >>>Again, this is useless when ballooning is being used.  But, I'm thinking
> >>>of a more general mechanism to force the system to both have MemFree
> >>>_and_ be acting as if it is under memory pressure.
> >>If there is no memory pressure on the host, there is no reason for
> >>the guest to pretend it is under pressure.  If there is memory
> >>pressure on the host, it should share the pain among its guests by
> >>applying the balloon.  So I don't think voluntarily dropping cache
> >>is a good direction.
> >>
> >There are two situations
> >
> >1. Voluntarily drop cache, if it was setup to do so (the host knows
> >that it caches that information anyway)
> 
> It doesn't, really.  The host only has aggregate information about
> itself, and no information about the guest.
> 
> Dropping duplicate pages would be good if we could identify them.
> Even then, it's better to drop the page from the host, not the
> guest, unless we know the same page is cached by multiple guests.
>

On the exact pages to drop, please see my comments above on the class
of pages to drop.
There are reasons for wanting to get the host to cache the data

Unless the guest is using cache = none, the data will still hit the
host page cache
The host can do a better job of optimizing the writeouts
 
> But why would the guest voluntarily drop the cache?  If there is no
> memory pressure, dropping caches increases cpu overhead and latency
> even if the data is still cached on the host.
> 

So, there are basically two approaches

1. First patch, proactive - enabled by a boot option
2. When ballooned, we try to (please NOTE try to) reclaim cached pages
first. Failing which, we go after regular pages in the alloc_page()
call in the balloon driver.

> >2. Drop the cache on either a special balloon option, again the host
> >knows it caches that very same information, so it prefers to free that
> >up first.
> 
> Dropping in response to pressure is good.  I'm just not convinced
> the patch helps in selecting the correct page to drop.
>

That is why I've presented data on the experiments I've run and
provided more arguments to backup the approach. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15  7:49                                   ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-15  7:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-15 09:58:33]:

> On 06/14/2010 08:45 PM, Balbir Singh wrote:
> >
> >>There are two decisions that need to be made:
> >>
> >>- how much memory a guest should be given
> >>- given some guest memory, what's the best use for it
> >>
> >>The first question can perhaps be answered by looking at guest I/O
> >>rates and giving more memory to more active guests.  The second
> >>question is hard, but not any different than running non-virtualized
> >>- except if we can detect sharing or duplication.  In this case,
> >>dropping a duplicated page is worthwhile, while dropping a shared
> >>page provides no benefit.
> >I think there is another way of looking at it, give some free memory
> >
> >1. Can the guest run more applications or run faster
> 
> That's my second question.  How to best use this memory.  More
> applications == drop the page from cache, faster == keep page in
> cache.
> 
> All we need is to select the right page to drop.
>

Do we need to drop to the granularity of the page to drop? I think
figuring out the class of pages and making sure that we don't write
our own reclaim logic, but work with what we have to identify the
class of pages is a good start. 
 
> >2. Can the host potentially get this memory via ballooning or some
> >other means to start newer guest instances
> 
> Well, we already have ballooning.  The question is can we improve
> the eviction algorithm.
> 
> >I think the answer to 1 and 2 is yes.
> >
> >>How the patch helps answer either question, I'm not sure.  I don't
> >>think preferential dropping of unmapped page cache is the answer.
> >>
> >Preferential dropping as selected by the host, that knows about the
> >setup and if there is duplication involved. While we use the term
> >preferential dropping, remember it is still via LRU and we don't
> >always succeed. It is a best effort (if you can and the unmapped pages
> >are not highly referenced) scenario.
> 
> How can the host tell if there is duplication?  It may know it has
> some pagecache, but it has no idea whether or to what extent guest
> pagecache duplicates host pagecache.
> 

Well it is possible in host user space, I for example use memory
cgroup and through the stats I have a good idea of how much is duplicated.
I am ofcourse making an assumption with my setup of the cached mode,
that the data in the guest page cache and page cache in the cgroup
will be duplicated to a large extent. I did some trivial experiments
like drop the data from the guest and look at the cost of bringing it
in and dropping the data from both guest and host and look at the
cost. I could see a difference.

Unfortunately, I did not save the data, so I'll need to redo the
experiment.

> >>>Those tell you how to balance going after the different classes of
> >>>things that we can reclaim.
> >>>
> >>>Again, this is useless when ballooning is being used.  But, I'm thinking
> >>>of a more general mechanism to force the system to both have MemFree
> >>>_and_ be acting as if it is under memory pressure.
> >>If there is no memory pressure on the host, there is no reason for
> >>the guest to pretend it is under pressure.  If there is memory
> >>pressure on the host, it should share the pain among its guests by
> >>applying the balloon.  So I don't think voluntarily dropping cache
> >>is a good direction.
> >>
> >There are two situations
> >
> >1. Voluntarily drop cache, if it was setup to do so (the host knows
> >that it caches that information anyway)
> 
> It doesn't, really.  The host only has aggregate information about
> itself, and no information about the guest.
> 
> Dropping duplicate pages would be good if we could identify them.
> Even then, it's better to drop the page from the host, not the
> guest, unless we know the same page is cached by multiple guests.
>

On the exact pages to drop, please see my comments above on the class
of pages to drop.
There are reasons for wanting to get the host to cache the data

Unless the guest is using cache = none, the data will still hit the
host page cache
The host can do a better job of optimizing the writeouts
 
> But why would the guest voluntarily drop the cache?  If there is no
> memory pressure, dropping caches increases cpu overhead and latency
> even if the data is still cached on the host.
> 

So, there are basically two approaches

1. First patch, proactive - enabled by a boot option
2. When ballooned, we try to (please NOTE try to) reclaim cached pages
first. Failing which, we go after regular pages in the alloc_page()
call in the balloon driver.

> >2. Drop the cache on either a special balloon option, again the host
> >knows it caches that very same information, so it prefers to free that
> >up first.
> 
> Dropping in response to pressure is good.  I'm just not convinced
> the patch helps in selecting the correct page to drop.
>

That is why I've presented data on the experiments I've run and
provided more arguments to backup the approach. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-15  7:12                         ` Avi Kivity
@ 2010-06-15  7:52                           ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-15  7:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-15 10:12:44]:

> On 06/14/2010 08:16 PM, Balbir Singh wrote:
> >* Dave Hansen<dave@linux.vnet.ibm.com>  [2010-06-14 10:09:31]:
> >
> >>On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
> >>>If you've got duplicate pages and you know
> >>>that they are duplicated and can be retrieved at a lower cost, why
> >>>wouldn't we go after them first?
> >>I agree with this in theory.  But, the guest lacks the information about
> >>what is truly duplicated and what the costs are for itself and/or the
> >>host to recreate it.  "Unmapped page cache" may be the best proxy that
> >>we have at the moment for "easy to recreate", but I think it's still too
> >>poor a match to make these patches useful.
> >>
> >That is why the policy (in the next set) will come from the host. As
> >to whether the data is truly duplicated, my experiments show up to 60%
> >of the page cache is duplicated.
> 
> Isn't that incredibly workload dependent?
> 
> We can't expect the host admin to know whether duplication will
> occur or not.
>

I was referring to cache = (policy) we use based on the setup. I don't
think the duplication is too workload specific. Moreover, we could use
aggressive policies and restrict page cache usage or do it selectively
on ballooning. We could also add other options to make the ballooning
option truly optional, so that the system management software decides. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15  7:52                           ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-15  7:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-15 10:12:44]:

> On 06/14/2010 08:16 PM, Balbir Singh wrote:
> >* Dave Hansen<dave@linux.vnet.ibm.com>  [2010-06-14 10:09:31]:
> >
> >>On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
> >>>If you've got duplicate pages and you know
> >>>that they are duplicated and can be retrieved at a lower cost, why
> >>>wouldn't we go after them first?
> >>I agree with this in theory.  But, the guest lacks the information about
> >>what is truly duplicated and what the costs are for itself and/or the
> >>host to recreate it.  "Unmapped page cache" may be the best proxy that
> >>we have at the moment for "easy to recreate", but I think it's still too
> >>poor a match to make these patches useful.
> >>
> >That is why the policy (in the next set) will come from the host. As
> >to whether the data is truly duplicated, my experiments show up to 60%
> >of the page cache is duplicated.
> 
> Isn't that incredibly workload dependent?
> 
> We can't expect the host admin to know whether duplication will
> occur or not.
>

I was referring to cache = (policy) we use based on the setup. I don't
think the duplication is too workload specific. Moreover, we could use
aggressive policies and restrict page cache usage or do it selectively
on ballooning. We could also add other options to make the ballooning
option truly optional, so that the system management software decides. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-15  7:49                                   ` Balbir Singh
@ 2010-06-15  9:44                                     ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  9:44 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/15/2010 10:49 AM, Balbir Singh wrote:
>
>> All we need is to select the right page to drop.
>>
>>      
> Do we need to drop to the granularity of the page to drop? I think
> figuring out the class of pages and making sure that we don't write
> our own reclaim logic, but work with what we have to identify the
> class of pages is a good start.
>    

Well, the class of pages are 'pages that are duplicated on the host'.  
Unmapped page cache pages are 'pages that might be duplicated on the 
host'.  IMO, that's not close enough.

>> How can the host tell if there is duplication?  It may know it has
>> some pagecache, but it has no idea whether or to what extent guest
>> pagecache duplicates host pagecache.
>>
>>      
> Well it is possible in host user space, I for example use memory
> cgroup and through the stats I have a good idea of how much is duplicated.
> I am ofcourse making an assumption with my setup of the cached mode,
> that the data in the guest page cache and page cache in the cgroup
> will be duplicated to a large extent. I did some trivial experiments
> like drop the data from the guest and look at the cost of bringing it
> in and dropping the data from both guest and host and look at the
> cost. I could see a difference.
>
> Unfortunately, I did not save the data, so I'll need to redo the
> experiment.
>    

I'm sure we can detect it experimentally, but how do we do it 
programatically at run time (without dropping all the pages).  
Situations change, and I don't think we can infer from a few experiments 
that we'll have a similar amount of sharing.  The cost of an incorrect 
decision is too high IMO (not that I think the kernel always chooses the 
right pages now, but I'd like to avoid regressions from the 
unvirtualized state).

btw, when running with a disk controller that has a very large cache, we 
might also see duplication between "guest" and host.  So, if this is a 
good idea, it shouldn't be enabled just for virtualization, but for any 
situation where we have a sizeable cache behind us.

>> It doesn't, really.  The host only has aggregate information about
>> itself, and no information about the guest.
>>
>> Dropping duplicate pages would be good if we could identify them.
>> Even then, it's better to drop the page from the host, not the
>> guest, unless we know the same page is cached by multiple guests.
>>
>>      
> On the exact pages to drop, please see my comments above on the class
> of pages to drop.
>    

Well, we disagree about that.  There is some value in dropping 
duplicated pages (not always), but that's not what the patch does.  It 
drops unmapped pagecache pages, which may or may not be duplicated.

> There are reasons for wanting to get the host to cache the data
>    

There are also reasons to get the guest to cache the data - it's more 
efficient to access it in the guest.

> Unless the guest is using cache = none, the data will still hit the
> host page cache
> The host can do a better job of optimizing the writeouts
>    

True, especially for non-raw storage.  But even there we have to fsync 
all the time to keep the metadata right.

>> But why would the guest voluntarily drop the cache?  If there is no
>> memory pressure, dropping caches increases cpu overhead and latency
>> even if the data is still cached on the host.
>>
>>      
> So, there are basically two approaches
>
> 1. First patch, proactive - enabled by a boot option
> 2. When ballooned, we try to (please NOTE try to) reclaim cached pages
> first. Failing which, we go after regular pages in the alloc_page()
> call in the balloon driver.
>    

Doesn't that mean you may evict a RU mapped page ahead of an LRU 
unmapped page, just in the hope that it is double-cached?

Maybe we need the guest and host to talk to each other about which pages 
to keep.

>>> 2. Drop the cache on either a special balloon option, again the host
>>> knows it caches that very same information, so it prefers to free that
>>> up first.
>>>        
>> Dropping in response to pressure is good.  I'm just not convinced
>> the patch helps in selecting the correct page to drop.
>>
>>      
> That is why I've presented data on the experiments I've run and
> provided more arguments to backup the approach.
>    

I'm still unconvinced, sorry.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15  9:44                                     ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  9:44 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/15/2010 10:49 AM, Balbir Singh wrote:
>
>> All we need is to select the right page to drop.
>>
>>      
> Do we need to drop to the granularity of the page to drop? I think
> figuring out the class of pages and making sure that we don't write
> our own reclaim logic, but work with what we have to identify the
> class of pages is a good start.
>    

Well, the class of pages are 'pages that are duplicated on the host'.  
Unmapped page cache pages are 'pages that might be duplicated on the 
host'.  IMO, that's not close enough.

>> How can the host tell if there is duplication?  It may know it has
>> some pagecache, but it has no idea whether or to what extent guest
>> pagecache duplicates host pagecache.
>>
>>      
> Well it is possible in host user space, I for example use memory
> cgroup and through the stats I have a good idea of how much is duplicated.
> I am ofcourse making an assumption with my setup of the cached mode,
> that the data in the guest page cache and page cache in the cgroup
> will be duplicated to a large extent. I did some trivial experiments
> like drop the data from the guest and look at the cost of bringing it
> in and dropping the data from both guest and host and look at the
> cost. I could see a difference.
>
> Unfortunately, I did not save the data, so I'll need to redo the
> experiment.
>    

I'm sure we can detect it experimentally, but how do we do it 
programatically at run time (without dropping all the pages).  
Situations change, and I don't think we can infer from a few experiments 
that we'll have a similar amount of sharing.  The cost of an incorrect 
decision is too high IMO (not that I think the kernel always chooses the 
right pages now, but I'd like to avoid regressions from the 
unvirtualized state).

btw, when running with a disk controller that has a very large cache, we 
might also see duplication between "guest" and host.  So, if this is a 
good idea, it shouldn't be enabled just for virtualization, but for any 
situation where we have a sizeable cache behind us.

>> It doesn't, really.  The host only has aggregate information about
>> itself, and no information about the guest.
>>
>> Dropping duplicate pages would be good if we could identify them.
>> Even then, it's better to drop the page from the host, not the
>> guest, unless we know the same page is cached by multiple guests.
>>
>>      
> On the exact pages to drop, please see my comments above on the class
> of pages to drop.
>    

Well, we disagree about that.  There is some value in dropping 
duplicated pages (not always), but that's not what the patch does.  It 
drops unmapped pagecache pages, which may or may not be duplicated.

> There are reasons for wanting to get the host to cache the data
>    

There are also reasons to get the guest to cache the data - it's more 
efficient to access it in the guest.

> Unless the guest is using cache = none, the data will still hit the
> host page cache
> The host can do a better job of optimizing the writeouts
>    

True, especially for non-raw storage.  But even there we have to fsync 
all the time to keep the metadata right.

>> But why would the guest voluntarily drop the cache?  If there is no
>> memory pressure, dropping caches increases cpu overhead and latency
>> even if the data is still cached on the host.
>>
>>      
> So, there are basically two approaches
>
> 1. First patch, proactive - enabled by a boot option
> 2. When ballooned, we try to (please NOTE try to) reclaim cached pages
> first. Failing which, we go after regular pages in the alloc_page()
> call in the balloon driver.
>    

Doesn't that mean you may evict a RU mapped page ahead of an LRU 
unmapped page, just in the hope that it is double-cached?

Maybe we need the guest and host to talk to each other about which pages 
to keep.

>>> 2. Drop the cache on either a special balloon option, again the host
>>> knows it caches that very same information, so it prefers to free that
>>> up first.
>>>        
>> Dropping in response to pressure is good.  I'm just not convinced
>> the patch helps in selecting the correct page to drop.
>>
>>      
> That is why I've presented data on the experiments I've run and
> provided more arguments to backup the approach.
>    

I'm still unconvinced, sorry.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-15  7:52                           ` Balbir Singh
@ 2010-06-15  9:54                             ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  9:54 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/15/2010 10:52 AM, Balbir Singh wrote:
>>>
>>> That is why the policy (in the next set) will come from the host. As
>>> to whether the data is truly duplicated, my experiments show up to 60%
>>> of the page cache is duplicated.
>>>        
>> Isn't that incredibly workload dependent?
>>
>> We can't expect the host admin to know whether duplication will
>> occur or not.
>>
>>      
> I was referring to cache = (policy) we use based on the setup. I don't
> think the duplication is too workload specific. Moreover, we could use
> aggressive policies and restrict page cache usage or do it selectively
> on ballooning. We could also add other options to make the ballooning
> option truly optional, so that the system management software decides.
>    

Consider a read-only workload that exactly fits in guest cache.  Without 
trimming, the guest will keep hitting its own cache, and the host will 
see no access to the cache at all.  So the host (assuming it is under 
even low pressure) will evict those pages, and the guest will happily 
use its own cache.  If we start to trim, the guest will have to go to 
disk.  That's the best case.

Now for the worst case.  A random access workload that misses the cache 
on both guest and host.  Now every page is duplicated, and trimming 
guest pages allows the host to increase its cache, and potentially 
reduce misses.  In this case trimming duplicated pages works.

Real life will see a mix of this.  Often used pages won't be duplicated, 
and less often used pages may see some duplication, especially if the 
host cache portion dedicated to the guest is bigger than the guest cache.

I can see that trimming duplicate pages helps, but (a) I'd like to be 
sure they are duplicates and (b) often trimming them from the host is 
better than trimming them from the guest.

Trimming from the guest is worthwhile if the pages are not used very 
often (but enough that caching them in the host is worth it) and if the 
host cache can serve more than one guest.  If we can identify those 
pages, we don't risk degrading best-case workloads (as defined above).

(note ksm to some extent identifies those pages, though it is a bit 
expensive, and doesn't share with the host pagecache).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15  9:54                             ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-15  9:54 UTC (permalink / raw)
  To: balbir; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

On 06/15/2010 10:52 AM, Balbir Singh wrote:
>>>
>>> That is why the policy (in the next set) will come from the host. As
>>> to whether the data is truly duplicated, my experiments show up to 60%
>>> of the page cache is duplicated.
>>>        
>> Isn't that incredibly workload dependent?
>>
>> We can't expect the host admin to know whether duplication will
>> occur or not.
>>
>>      
> I was referring to cache = (policy) we use based on the setup. I don't
> think the duplication is too workload specific. Moreover, we could use
> aggressive policies and restrict page cache usage or do it selectively
> on ballooning. We could also add other options to make the ballooning
> option truly optional, so that the system management software decides.
>    

Consider a read-only workload that exactly fits in guest cache.  Without 
trimming, the guest will keep hitting its own cache, and the host will 
see no access to the cache at all.  So the host (assuming it is under 
even low pressure) will evict those pages, and the guest will happily 
use its own cache.  If we start to trim, the guest will have to go to 
disk.  That's the best case.

Now for the worst case.  A random access workload that misses the cache 
on both guest and host.  Now every page is duplicated, and trimming 
guest pages allows the host to increase its cache, and potentially 
reduce misses.  In this case trimming duplicated pages works.

Real life will see a mix of this.  Often used pages won't be duplicated, 
and less often used pages may see some duplication, especially if the 
host cache portion dedicated to the guest is bigger than the guest cache.

I can see that trimming duplicate pages helps, but (a) I'd like to be 
sure they are duplicates and (b) often trimming them from the host is 
better than trimming them from the guest.

Trimming from the guest is worthwhile if the pages are not used very 
often (but enough that caching them in the host is worth it) and if the 
host cache can serve more than one guest.  If we can identify those 
pages, we don't risk degrading best-case workloads (as defined above).

(note ksm to some extent identifies those pages, though it is a bit 
expensive, and doesn't share with the host pagecache).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-15  9:44                                     ` Avi Kivity
@ 2010-06-15 10:18                                       ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-15 10:18 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-15 12:44:31]:

> On 06/15/2010 10:49 AM, Balbir Singh wrote:
> >
> >>All we need is to select the right page to drop.
> >>
> >Do we need to drop to the granularity of the page to drop? I think
> >figuring out the class of pages and making sure that we don't write
> >our own reclaim logic, but work with what we have to identify the
> >class of pages is a good start.
> 
> Well, the class of pages are 'pages that are duplicated on the
> host'.  Unmapped page cache pages are 'pages that might be
> duplicated on the host'.  IMO, that's not close enough.
>

Agreed, but what happens in reality with the code is that it drops
not-so-frequently-used cache (still reusing the reclaim mechanism),
but prioritizing cached memory.
 
> >>How can the host tell if there is duplication?  It may know it has
> >>some pagecache, but it has no idea whether or to what extent guest
> >>pagecache duplicates host pagecache.
> >>
> >Well it is possible in host user space, I for example use memory
> >cgroup and through the stats I have a good idea of how much is duplicated.
> >I am ofcourse making an assumption with my setup of the cached mode,
> >that the data in the guest page cache and page cache in the cgroup
> >will be duplicated to a large extent. I did some trivial experiments
> >like drop the data from the guest and look at the cost of bringing it
> >in and dropping the data from both guest and host and look at the
> >cost. I could see a difference.
> >
> >Unfortunately, I did not save the data, so I'll need to redo the
> >experiment.
> 
> I'm sure we can detect it experimentally, but how do we do it
> programatically at run time (without dropping all the pages).
> Situations change, and I don't think we can infer from a few
> experiments that we'll have a similar amount of sharing.  The cost
> of an incorrect decision is too high IMO (not that I think the
> kernel always chooses the right pages now, but I'd like to avoid
> regressions from the unvirtualized state).
> 
> btw, when running with a disk controller that has a very large
> cache, we might also see duplication between "guest" and host.  So,
> if this is a good idea, it shouldn't be enabled just for
> virtualization, but for any situation where we have a sizeable cache
> behind us.
> 

It depends, once the disk controller has the cache and the pages in
the guest are not-so-frequently-used we can drop them. Please remember
we still use the LRU to identify these pages.

> >>It doesn't, really.  The host only has aggregate information about
> >>itself, and no information about the guest.
> >>
> >>Dropping duplicate pages would be good if we could identify them.
> >>Even then, it's better to drop the page from the host, not the
> >>guest, unless we know the same page is cached by multiple guests.
> >>
> >On the exact pages to drop, please see my comments above on the class
> >of pages to drop.
> 
> Well, we disagree about that.  There is some value in dropping
> duplicated pages (not always), but that's not what the patch does.
> It drops unmapped pagecache pages, which may or may not be
> duplicated.
> 
> >There are reasons for wanting to get the host to cache the data
> 
> There are also reasons to get the guest to cache the data - it's
> more efficient to access it in the guest.
> 
> >Unless the guest is using cache = none, the data will still hit the
> >host page cache
> >The host can do a better job of optimizing the writeouts
> 
> True, especially for non-raw storage.  But even there we have to
> fsync all the time to keep the metadata right.
> 
> >>But why would the guest voluntarily drop the cache?  If there is no
> >>memory pressure, dropping caches increases cpu overhead and latency
> >>even if the data is still cached on the host.
> >>
> >So, there are basically two approaches
> >
> >1. First patch, proactive - enabled by a boot option
> >2. When ballooned, we try to (please NOTE try to) reclaim cached pages
> >first. Failing which, we go after regular pages in the alloc_page()
> >call in the balloon driver.
> 
> Doesn't that mean you may evict a RU mapped page ahead of an LRU
> unmapped page, just in the hope that it is double-cached?
> 
> Maybe we need the guest and host to talk to each other about which
> pages to keep.
> 

Yeah.. I guess that falls into the domain of CMM.

> >>>2. Drop the cache on either a special balloon option, again the host
> >>>knows it caches that very same information, so it prefers to free that
> >>>up first.
> >>Dropping in response to pressure is good.  I'm just not convinced
> >>the patch helps in selecting the correct page to drop.
> >>
> >That is why I've presented data on the experiments I've run and
> >provided more arguments to backup the approach.
> 
> I'm still unconvinced, sorry.
> 

The reason for making this optional is to let the administrators
decide how they want to use the memory in the system. In some
situations it might be a big no-no to waste memory, in some cases it
might be acceptable. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15 10:18                                       ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-15 10:18 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-15 12:44:31]:

> On 06/15/2010 10:49 AM, Balbir Singh wrote:
> >
> >>All we need is to select the right page to drop.
> >>
> >Do we need to drop to the granularity of the page to drop? I think
> >figuring out the class of pages and making sure that we don't write
> >our own reclaim logic, but work with what we have to identify the
> >class of pages is a good start.
> 
> Well, the class of pages are 'pages that are duplicated on the
> host'.  Unmapped page cache pages are 'pages that might be
> duplicated on the host'.  IMO, that's not close enough.
>

Agreed, but what happens in reality with the code is that it drops
not-so-frequently-used cache (still reusing the reclaim mechanism),
but prioritizing cached memory.
 
> >>How can the host tell if there is duplication?  It may know it has
> >>some pagecache, but it has no idea whether or to what extent guest
> >>pagecache duplicates host pagecache.
> >>
> >Well it is possible in host user space, I for example use memory
> >cgroup and through the stats I have a good idea of how much is duplicated.
> >I am ofcourse making an assumption with my setup of the cached mode,
> >that the data in the guest page cache and page cache in the cgroup
> >will be duplicated to a large extent. I did some trivial experiments
> >like drop the data from the guest and look at the cost of bringing it
> >in and dropping the data from both guest and host and look at the
> >cost. I could see a difference.
> >
> >Unfortunately, I did not save the data, so I'll need to redo the
> >experiment.
> 
> I'm sure we can detect it experimentally, but how do we do it
> programatically at run time (without dropping all the pages).
> Situations change, and I don't think we can infer from a few
> experiments that we'll have a similar amount of sharing.  The cost
> of an incorrect decision is too high IMO (not that I think the
> kernel always chooses the right pages now, but I'd like to avoid
> regressions from the unvirtualized state).
> 
> btw, when running with a disk controller that has a very large
> cache, we might also see duplication between "guest" and host.  So,
> if this is a good idea, it shouldn't be enabled just for
> virtualization, but for any situation where we have a sizeable cache
> behind us.
> 

It depends, once the disk controller has the cache and the pages in
the guest are not-so-frequently-used we can drop them. Please remember
we still use the LRU to identify these pages.

> >>It doesn't, really.  The host only has aggregate information about
> >>itself, and no information about the guest.
> >>
> >>Dropping duplicate pages would be good if we could identify them.
> >>Even then, it's better to drop the page from the host, not the
> >>guest, unless we know the same page is cached by multiple guests.
> >>
> >On the exact pages to drop, please see my comments above on the class
> >of pages to drop.
> 
> Well, we disagree about that.  There is some value in dropping
> duplicated pages (not always), but that's not what the patch does.
> It drops unmapped pagecache pages, which may or may not be
> duplicated.
> 
> >There are reasons for wanting to get the host to cache the data
> 
> There are also reasons to get the guest to cache the data - it's
> more efficient to access it in the guest.
> 
> >Unless the guest is using cache = none, the data will still hit the
> >host page cache
> >The host can do a better job of optimizing the writeouts
> 
> True, especially for non-raw storage.  But even there we have to
> fsync all the time to keep the metadata right.
> 
> >>But why would the guest voluntarily drop the cache?  If there is no
> >>memory pressure, dropping caches increases cpu overhead and latency
> >>even if the data is still cached on the host.
> >>
> >So, there are basically two approaches
> >
> >1. First patch, proactive - enabled by a boot option
> >2. When ballooned, we try to (please NOTE try to) reclaim cached pages
> >first. Failing which, we go after regular pages in the alloc_page()
> >call in the balloon driver.
> 
> Doesn't that mean you may evict a RU mapped page ahead of an LRU
> unmapped page, just in the hope that it is double-cached?
> 
> Maybe we need the guest and host to talk to each other about which
> pages to keep.
> 

Yeah.. I guess that falls into the domain of CMM.

> >>>2. Drop the cache on either a special balloon option, again the host
> >>>knows it caches that very same information, so it prefers to free that
> >>>up first.
> >>Dropping in response to pressure is good.  I'm just not convinced
> >>the patch helps in selecting the correct page to drop.
> >>
> >That is why I've presented data on the experiments I've run and
> >provided more arguments to backup the approach.
> 
> I'm still unconvinced, sorry.
> 

The reason for making this optional is to let the administrators
decide how they want to use the memory in the system. In some
situations it might be a big no-no to waste memory, in some cases it
might be acceptable. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-15  9:54                             ` Avi Kivity
@ 2010-06-15 12:49                               ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-15 12:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-15 12:54:31]:

> On 06/15/2010 10:52 AM, Balbir Singh wrote:
> >>>
> >>>That is why the policy (in the next set) will come from the host. As
> >>>to whether the data is truly duplicated, my experiments show up to 60%
> >>>of the page cache is duplicated.
> >>Isn't that incredibly workload dependent?
> >>
> >>We can't expect the host admin to know whether duplication will
> >>occur or not.
> >>
> >I was referring to cache = (policy) we use based on the setup. I don't
> >think the duplication is too workload specific. Moreover, we could use
> >aggressive policies and restrict page cache usage or do it selectively
> >on ballooning. We could also add other options to make the ballooning
> >option truly optional, so that the system management software decides.
> 
> Consider a read-only workload that exactly fits in guest cache.
> Without trimming, the guest will keep hitting its own cache, and the
> host will see no access to the cache at all.  So the host (assuming
> it is under even low pressure) will evict those pages, and the guest
> will happily use its own cache.  If we start to trim, the guest will
> have to go to disk.  That's the best case.
>
> Now for the worst case.  A random access workload that misses the
> cache on both guest and host.  Now every page is duplicated, and
> trimming guest pages allows the host to increase its cache, and
> potentially reduce misses.  In this case trimming duplicated pages
> works.
> 
> Real life will see a mix of this.  Often used pages won't be
> duplicated, and less often used pages may see some duplication,
> especially if the host cache portion dedicated to the guest is
> bigger than the guest cache.
> 
> I can see that trimming duplicate pages helps, but (a) I'd like to
> be sure they are duplicates and (b) often trimming them from the
> host is better than trimming them from the guest.
>

Lets see the behaviour with these patches

The first patch is a proactive approach to keep more memory around.
Enabling the parameter implies we are OK paying the cost of some
overhead. My data shows that leaves a significant amount of free
memory with a small 5% (in my case) overhead. This brings us back to
what you can do with free memory.

The second patch shows no overhead and selectively tries to use free
cache to return back on memory pressure (as indicated by the balloon
driver). We've discussed the reasons for doing this

1. In the situations where cache is duplicated this should benefit
us. Your contention is that we need to be specific about the
duplication. That falls under the realm of CMM.
2. In the case of slab cache, duplication does not matter, it is a
free page, that should be reclaimed ahead of mapped pages ideally.
If the slab grows, it will get another new page.

What is the cost of (1)

In the worst case, we select a non-duplicated page, but for us to
select it, it should be inactive, in that case we do I/O to bring back
the page.

> Trimming from the guest is worthwhile if the pages are not used very
> often (but enough that caching them in the host is worth it) and if
> the host cache can serve more than one guest.  If we can identify
> those pages, we don't risk degrading best-case workloads (as defined
> above).
> 
> (note ksm to some extent identifies those pages, though it is a bit
> expensive, and doesn't share with the host pagecache).
>

I see that you are hinting towards finding exact duplicates, I don't
know if the cost and complexity justify it. I hope more users can try
the patches with and without the boot parameter and provide additional
feedback.

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15 12:49                               ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-15 12:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-15 12:54:31]:

> On 06/15/2010 10:52 AM, Balbir Singh wrote:
> >>>
> >>>That is why the policy (in the next set) will come from the host. As
> >>>to whether the data is truly duplicated, my experiments show up to 60%
> >>>of the page cache is duplicated.
> >>Isn't that incredibly workload dependent?
> >>
> >>We can't expect the host admin to know whether duplication will
> >>occur or not.
> >>
> >I was referring to cache = (policy) we use based on the setup. I don't
> >think the duplication is too workload specific. Moreover, we could use
> >aggressive policies and restrict page cache usage or do it selectively
> >on ballooning. We could also add other options to make the ballooning
> >option truly optional, so that the system management software decides.
> 
> Consider a read-only workload that exactly fits in guest cache.
> Without trimming, the guest will keep hitting its own cache, and the
> host will see no access to the cache at all.  So the host (assuming
> it is under even low pressure) will evict those pages, and the guest
> will happily use its own cache.  If we start to trim, the guest will
> have to go to disk.  That's the best case.
>
> Now for the worst case.  A random access workload that misses the
> cache on both guest and host.  Now every page is duplicated, and
> trimming guest pages allows the host to increase its cache, and
> potentially reduce misses.  In this case trimming duplicated pages
> works.
> 
> Real life will see a mix of this.  Often used pages won't be
> duplicated, and less often used pages may see some duplication,
> especially if the host cache portion dedicated to the guest is
> bigger than the guest cache.
> 
> I can see that trimming duplicate pages helps, but (a) I'd like to
> be sure they are duplicates and (b) often trimming them from the
> host is better than trimming them from the guest.
>

Lets see the behaviour with these patches

The first patch is a proactive approach to keep more memory around.
Enabling the parameter implies we are OK paying the cost of some
overhead. My data shows that leaves a significant amount of free
memory with a small 5% (in my case) overhead. This brings us back to
what you can do with free memory.

The second patch shows no overhead and selectively tries to use free
cache to return back on memory pressure (as indicated by the balloon
driver). We've discussed the reasons for doing this

1. In the situations where cache is duplicated this should benefit
us. Your contention is that we need to be specific about the
duplication. That falls under the realm of CMM.
2. In the case of slab cache, duplication does not matter, it is a
free page, that should be reclaimed ahead of mapped pages ideally.
If the slab grows, it will get another new page.

What is the cost of (1)

In the worst case, we select a non-duplicated page, but for us to
select it, it should be inactive, in that case we do I/O to bring back
the page.

> Trimming from the guest is worthwhile if the pages are not used very
> often (but enough that caching them in the host is worth it) and if
> the host cache can serve more than one guest.  If we can identify
> those pages, we don't risk degrading best-case workloads (as defined
> above).
> 
> (note ksm to some extent identifies those pages, though it is a bit
> expensive, and doesn't share with the host pagecache).
>

I see that you are hinting towards finding exact duplicates, I don't
know if the cost and complexity justify it. I hope more users can try
the patches with and without the boot parameter and provide additional
feedback.

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-15  7:07                                 ` Avi Kivity
@ 2010-06-15 14:47                                   ` Dave Hansen
  -1 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-15 14:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: balbir, kvm, linux-mm, linux-kernel

On Tue, 2010-06-15 at 10:07 +0300, Avi Kivity wrote:
> On 06/14/2010 08:58 PM, Dave Hansen wrote:
> > On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
> >    
> >>> Again, this is useless when ballooning is being used.  But, I'm thinking
> >>> of a more general mechanism to force the system to both have MemFree
> >>> _and_ be acting as if it is under memory pressure.
> >>>
> >>>        
> >> If there is no memory pressure on the host, there is no reason for the
> >> guest to pretend it is under pressure.
> >>      
> > I can think of quite a few places where this would be beneficial.
> >
> > Ballooning is dangerous.  I've OOMed quite a few guests by
> > over-ballooning them.  Anything that's voluntary like this is safer than
> > things imposed by the host, although you do trade of effectiveness.
> 
> That's a bug that needs to be fixed.  Eventually the host will come 
> under pressure and will balloon the guest.  If that kills the guest, the 
> ballooning is not effective as a host memory management technique.

I'm not convinced that it's just a bug that can be fixed.  Consider a
case where a host sees a guest with 100MB of free memory at the exact
moment that a database app sees that memory.  The host tries to balloon
that memory away at the same time that the app goes and allocates it.
That can certainly lead to an OOM very quickly, even for very small
amounts of memory (much less than 100MB).  Where's the bug?

I think the issues are really fundamental to ballooning.

> > If all the guests do this, then it leaves that much more free memory on
> > the host, which can be used flexibly for extra host page cache, new
> > guests, etc...
> 
> If the host detects lots of pagecache misses it can balloon guests 
> down.  If pagecache is quiet, why change anything?

Page cache misses alone are not really sufficient.  This is the classic
problem where we try to differentiate streaming I/O (which we can't
effectively cache) from I/O which can be effectively cached.

> If the host wants to start new guests, it can balloon guests down.  If 
> no new guests are wanted, why change anything?

We're talking about an environment which we're always trying to
optimize.  Imagine that we're always trying to consolidate guests on to
smaller numbers of hosts.  We're effectively in a state where we
_always_ want new guests.

-- Dave


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-15 14:47                                   ` Dave Hansen
  0 siblings, 0 replies; 96+ messages in thread
From: Dave Hansen @ 2010-06-15 14:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: balbir, kvm, linux-mm, linux-kernel

On Tue, 2010-06-15 at 10:07 +0300, Avi Kivity wrote:
> On 06/14/2010 08:58 PM, Dave Hansen wrote:
> > On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
> >    
> >>> Again, this is useless when ballooning is being used.  But, I'm thinking
> >>> of a more general mechanism to force the system to both have MemFree
> >>> _and_ be acting as if it is under memory pressure.
> >>>
> >>>        
> >> If there is no memory pressure on the host, there is no reason for the
> >> guest to pretend it is under pressure.
> >>      
> > I can think of quite a few places where this would be beneficial.
> >
> > Ballooning is dangerous.  I've OOMed quite a few guests by
> > over-ballooning them.  Anything that's voluntary like this is safer than
> > things imposed by the host, although you do trade of effectiveness.
> 
> That's a bug that needs to be fixed.  Eventually the host will come 
> under pressure and will balloon the guest.  If that kills the guest, the 
> ballooning is not effective as a host memory management technique.

I'm not convinced that it's just a bug that can be fixed.  Consider a
case where a host sees a guest with 100MB of free memory at the exact
moment that a database app sees that memory.  The host tries to balloon
that memory away at the same time that the app goes and allocates it.
That can certainly lead to an OOM very quickly, even for very small
amounts of memory (much less than 100MB).  Where's the bug?

I think the issues are really fundamental to ballooning.

> > If all the guests do this, then it leaves that much more free memory on
> > the host, which can be used flexibly for extra host page cache, new
> > guests, etc...
> 
> If the host detects lots of pagecache misses it can balloon guests 
> down.  If pagecache is quiet, why change anything?

Page cache misses alone are not really sufficient.  This is the classic
problem where we try to differentiate streaming I/O (which we can't
effectively cache) from I/O which can be effectively cached.

> If the host wants to start new guests, it can balloon guests down.  If 
> no new guests are wanted, why change anything?

We're talking about an environment which we're always trying to
optimize.  Imagine that we're always trying to consolidate guests on to
smaller numbers of hosts.  We're effectively in a state where we
_always_ want new guests.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-15 14:47                                   ` Dave Hansen
@ 2010-06-16 11:39                                     ` Avi Kivity
  -1 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-16 11:39 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/15/2010 05:47 PM, Dave Hansen wrote:
>
>> That's a bug that needs to be fixed.  Eventually the host will come
>> under pressure and will balloon the guest.  If that kills the guest, the
>> ballooning is not effective as a host memory management technique.
>>      
> I'm not convinced that it's just a bug that can be fixed.  Consider a
> case where a host sees a guest with 100MB of free memory at the exact
> moment that a database app sees that memory.  The host tries to balloon
> that memory away at the same time that the app goes and allocates it.
> That can certainly lead to an OOM very quickly, even for very small
> amounts of memory (much less than 100MB).  Where's the bug?
>
> I think the issues are really fundamental to ballooning.
>    

There are two issues involved.

One is, can the kernel accurately determine the amount of memory it 
needs to work?  We have resources such as RAM and swap.  We have 
liabilities in the form of swappable userspace memory, mlocked userspace 
memory, kernel memory to support these, and various reclaimable and 
non-reclaimable kernel caches.  Can we determine the minimum amount of 
RAM to support are workload at a point in time?

If we had this, we could modify the balloon to refuse to balloon if it 
takes the kernel beneath the minimum amount of RAM needed.

In fact, this is similar to allocating memory with overcommit_memory = 
0.  The difference is the balloon allocates mlocked memory, while normal 
allocations can be charged against swap.  But fundamentally it's the same.

>>> If all the guests do this, then it leaves that much more free memory on
>>> the host, which can be used flexibly for extra host page cache, new
>>> guests, etc...
>>>        
>> If the host detects lots of pagecache misses it can balloon guests
>> down.  If pagecache is quiet, why change anything?
>>      
> Page cache misses alone are not really sufficient.  This is the classic
> problem where we try to differentiate streaming I/O (which we can't
> effectively cache) from I/O which can be effectively cached.
>    

True.  Random I/O across a very large dataset is also difficult to cache.

>> If the host wants to start new guests, it can balloon guests down.  If
>> no new guests are wanted, why change anything?
>>      
> We're talking about an environment which we're always trying to
> optimize.  Imagine that we're always trying to consolidate guests on to
> smaller numbers of hosts.  We're effectively in a state where we
> _always_ want new guests.
>    

If this came at no cost to the guests, you'd be right.  But at some 
point guest performance will be hit by this, so the advantage gained 
from freeing memory will be balanced by the disadvantage.

Also, memory is not the only resource.  At some point you become cpu 
bound; at that point freeing memory doesn't help and in fact may 
increase your cpu load.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-16 11:39                                     ` Avi Kivity
  0 siblings, 0 replies; 96+ messages in thread
From: Avi Kivity @ 2010-06-16 11:39 UTC (permalink / raw)
  To: Dave Hansen; +Cc: balbir, kvm, linux-mm, linux-kernel

On 06/15/2010 05:47 PM, Dave Hansen wrote:
>
>> That's a bug that needs to be fixed.  Eventually the host will come
>> under pressure and will balloon the guest.  If that kills the guest, the
>> ballooning is not effective as a host memory management technique.
>>      
> I'm not convinced that it's just a bug that can be fixed.  Consider a
> case where a host sees a guest with 100MB of free memory at the exact
> moment that a database app sees that memory.  The host tries to balloon
> that memory away at the same time that the app goes and allocates it.
> That can certainly lead to an OOM very quickly, even for very small
> amounts of memory (much less than 100MB).  Where's the bug?
>
> I think the issues are really fundamental to ballooning.
>    

There are two issues involved.

One is, can the kernel accurately determine the amount of memory it 
needs to work?  We have resources such as RAM and swap.  We have 
liabilities in the form of swappable userspace memory, mlocked userspace 
memory, kernel memory to support these, and various reclaimable and 
non-reclaimable kernel caches.  Can we determine the minimum amount of 
RAM to support are workload at a point in time?

If we had this, we could modify the balloon to refuse to balloon if it 
takes the kernel beneath the minimum amount of RAM needed.

In fact, this is similar to allocating memory with overcommit_memory = 
0.  The difference is the balloon allocates mlocked memory, while normal 
allocations can be charged against swap.  But fundamentally it's the same.

>>> If all the guests do this, then it leaves that much more free memory on
>>> the host, which can be used flexibly for extra host page cache, new
>>> guests, etc...
>>>        
>> If the host detects lots of pagecache misses it can balloon guests
>> down.  If pagecache is quiet, why change anything?
>>      
> Page cache misses alone are not really sufficient.  This is the classic
> problem where we try to differentiate streaming I/O (which we can't
> effectively cache) from I/O which can be effectively cached.
>    

True.  Random I/O across a very large dataset is also difficult to cache.

>> If the host wants to start new guests, it can balloon guests down.  If
>> no new guests are wanted, why change anything?
>>      
> We're talking about an environment which we're always trying to
> optimize.  Imagine that we're always trying to consolidate guests on to
> smaller numbers of hosts.  We're effectively in a state where we
> _always_ want new guests.
>    

If this came at no cost to the guests, you'd be right.  But at some 
point guest performance will be hit by this, so the advantage gained 
from freeing memory will be balanced by the disadvantage.

Also, memory is not the only resource.  At some point you become cpu 
bound; at that point freeing memory doesn't help and in fact may 
increase your cpu load.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
  2010-06-16 11:39                                     ` Avi Kivity
@ 2010-06-17  6:04                                       ` Balbir Singh
  -1 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-17  6:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-16 14:39:02]:

> >We're talking about an environment which we're always trying to
> >optimize.  Imagine that we're always trying to consolidate guests on to
> >smaller numbers of hosts.  We're effectively in a state where we
> >_always_ want new guests.
> 
> If this came at no cost to the guests, you'd be right.  But at some
> point guest performance will be hit by this, so the advantage gained
> from freeing memory will be balanced by the disadvantage.
> 
> Also, memory is not the only resource.  At some point you become cpu
> bound; at that point freeing memory doesn't help and in fact may
> increase your cpu load.
>

We'll probably need control over other resources as well, but IMHO
memory is the most precious because it is non-renewable. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control
@ 2010-06-17  6:04                                       ` Balbir Singh
  0 siblings, 0 replies; 96+ messages in thread
From: Balbir Singh @ 2010-06-17  6:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dave Hansen, kvm, linux-mm, linux-kernel

* Avi Kivity <avi@redhat.com> [2010-06-16 14:39:02]:

> >We're talking about an environment which we're always trying to
> >optimize.  Imagine that we're always trying to consolidate guests on to
> >smaller numbers of hosts.  We're effectively in a state where we
> >_always_ want new guests.
> 
> If this came at no cost to the guests, you'd be right.  But at some
> point guest performance will be hit by this, so the advantage gained
> from freeing memory will be balanced by the disadvantage.
> 
> Also, memory is not the only resource.  At some point you become cpu
> bound; at that point freeing memory doesn't help and in fact may
> increase your cpu load.
>

We'll probably need control over other resources as well, but IMHO
memory is the most precious because it is non-renewable. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2010-06-17  6:04 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-08 15:51 [RFC/T/D][PATCH 0/2] KVM page cache optimization (v2) Balbir Singh
2010-06-08 15:51 ` Balbir Singh
2010-06-08 15:51 ` [RFC][PATCH 1/2] Linux/Guest unmapped page cache control Balbir Singh
2010-06-08 15:51   ` Balbir Singh
2010-06-13 18:31   ` Balbir Singh
2010-06-13 18:31     ` Balbir Singh
2010-06-14  0:28     ` KAMEZAWA Hiroyuki
2010-06-14  0:28       ` KAMEZAWA Hiroyuki
2010-06-14  6:49       ` Balbir Singh
2010-06-14  6:49         ` Balbir Singh
2010-06-14  7:00         ` KAMEZAWA Hiroyuki
2010-06-14  7:00           ` KAMEZAWA Hiroyuki
2010-06-14  7:36           ` Balbir Singh
2010-06-14  7:36             ` Balbir Singh
2010-06-14  7:49             ` KAMEZAWA Hiroyuki
2010-06-14  7:49               ` KAMEZAWA Hiroyuki
2010-06-08 15:51 ` [RFC/T/D][PATCH 2/2] Linux/Guest cooperative " Balbir Singh
2010-06-08 15:51   ` Balbir Singh
2010-06-10  9:43   ` Avi Kivity
2010-06-10  9:43     ` Avi Kivity
2010-06-10 14:25     ` Balbir Singh
2010-06-10 14:25       ` Balbir Singh
2010-06-11  0:07       ` Dave Hansen
2010-06-11  0:07         ` Dave Hansen
2010-06-11  1:54         ` KAMEZAWA Hiroyuki
2010-06-11  1:54           ` KAMEZAWA Hiroyuki
2010-06-11  4:46           ` Balbir Singh
2010-06-11  4:46             ` Balbir Singh
2010-06-11  5:05             ` KAMEZAWA Hiroyuki
2010-06-11  5:05               ` KAMEZAWA Hiroyuki
2010-06-11  5:08               ` KAMEZAWA Hiroyuki
2010-06-11  5:08                 ` KAMEZAWA Hiroyuki
2010-06-11  6:14               ` Balbir Singh
2010-06-11  6:14                 ` Balbir Singh
2010-06-11  4:56         ` Balbir Singh
2010-06-11  4:56           ` Balbir Singh
2010-06-14  8:09           ` Avi Kivity
2010-06-14  8:09             ` Avi Kivity
2010-06-14  8:48             ` Balbir Singh
2010-06-14  8:48               ` Balbir Singh
2010-06-14 12:40               ` Avi Kivity
2010-06-14 12:40                 ` Avi Kivity
2010-06-14 12:50                 ` Balbir Singh
2010-06-14 12:50                   ` Balbir Singh
2010-06-14 13:01                   ` Avi Kivity
2010-06-14 13:01                     ` Avi Kivity
2010-06-14 15:33                     ` Dave Hansen
2010-06-14 15:33                       ` Dave Hansen
2010-06-14 15:44                       ` Avi Kivity
2010-06-14 15:44                         ` Avi Kivity
2010-06-14 15:55                         ` Dave Hansen
2010-06-14 15:55                           ` Dave Hansen
2010-06-14 16:34                           ` Avi Kivity
2010-06-14 16:34                             ` Avi Kivity
2010-06-14 17:45                             ` Balbir Singh
2010-06-14 17:45                               ` Balbir Singh
2010-06-15  6:58                               ` Avi Kivity
2010-06-15  6:58                                 ` Avi Kivity
2010-06-15  7:49                                 ` Balbir Singh
2010-06-15  7:49                                   ` Balbir Singh
2010-06-15  9:44                                   ` Avi Kivity
2010-06-15  9:44                                     ` Avi Kivity
2010-06-15 10:18                                     ` Balbir Singh
2010-06-15 10:18                                       ` Balbir Singh
2010-06-14 17:58                             ` Dave Hansen
2010-06-14 17:58                               ` Dave Hansen
2010-06-15  7:07                               ` Avi Kivity
2010-06-15  7:07                                 ` Avi Kivity
2010-06-15 14:47                                 ` Dave Hansen
2010-06-15 14:47                                   ` Dave Hansen
2010-06-16 11:39                                   ` Avi Kivity
2010-06-16 11:39                                     ` Avi Kivity
2010-06-17  6:04                                     ` Balbir Singh
2010-06-17  6:04                                       ` Balbir Singh
2010-06-14 15:12               ` Dave Hansen
2010-06-14 15:12                 ` Dave Hansen
2010-06-14 15:34                 ` Avi Kivity
2010-06-14 15:34                   ` Avi Kivity
2010-06-14 17:40                   ` Balbir Singh
2010-06-14 17:40                     ` Balbir Singh
2010-06-15  7:11                     ` Avi Kivity
2010-06-15  7:11                       ` Avi Kivity
2010-06-14 16:58                 ` Balbir Singh
2010-06-14 16:58                   ` Balbir Singh
2010-06-14 17:09                   ` Dave Hansen
2010-06-14 17:09                     ` Dave Hansen
2010-06-14 17:16                     ` Balbir Singh
2010-06-14 17:16                       ` Balbir Singh
2010-06-15  7:12                       ` Avi Kivity
2010-06-15  7:12                         ` Avi Kivity
2010-06-15  7:52                         ` Balbir Singh
2010-06-15  7:52                           ` Balbir Singh
2010-06-15  9:54                           ` Avi Kivity
2010-06-15  9:54                             ` Avi Kivity
2010-06-15 12:49                             ` Balbir Singh
2010-06-15 12:49                               ` Balbir Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.