All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] KVM page cache optimization (v3)
@ 2010-10-28 22:40 ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, Balbir Singh, qemu-devel

This is version 3 of the page cache control patches

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This series has three patches, the first controls
the amount of unmapped page cache usage via a boot
parameter and sysctl. The second patch controls page
and slab cache via the balloon driver. Both the patches
make heavy use of the zone_reclaim() functionality
already present in the kernel.

The last patch in the series is against QEmu to make
the ballooning hint optional.

V2 was posted a long time back (see http://lwn.net/Articles/391293/)
One of the review suggestions was to make the hint optional
(discussed in the community call as well).

I'd appreciate any test results with the patches.

TODO

1. libvirt exploits for optional hint

page-cache-control
balloon-page-cache
provide-memory-hint-during-ballooning

---
 b/balloon.c                       |   18 +++-
 b/balloon.h                       |    4
 b/drivers/virtio/virtio_balloon.c |   17 +++
 b/hmp-commands.hx                 |    7 +
 b/hw/virtio-balloon.c             |   14 ++-
 b/hw/virtio-balloon.h             |    3
 b/include/linux/gfp.h             |    8 +
 b/include/linux/mmzone.h          |    2
 b/include/linux/swap.h            |    3
 b/include/linux/virtio_balloon.h  |    3
 b/mm/page_alloc.c                 |    9 +-
 b/mm/vmscan.c                     |  162 ++++++++++++++++++++++++++++----------
 b/qmp-commands.hx                 |    7 -
 include/linux/swap.h              |    9 --
 mm/page_alloc.c                   |    3
 mm/vmscan.c                       |    2
 16 files changed, 202 insertions(+), 69 deletions(-)


-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC][PATCH 0/3] KVM page cache optimization (v3)
@ 2010-10-28 22:40 ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, qemu-devel, Balbir Singh

This is version 3 of the page cache control patches

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This series has three patches, the first controls
the amount of unmapped page cache usage via a boot
parameter and sysctl. The second patch controls page
and slab cache via the balloon driver. Both the patches
make heavy use of the zone_reclaim() functionality
already present in the kernel.

The last patch in the series is against QEmu to make
the ballooning hint optional.

V2 was posted a long time back (see http://lwn.net/Articles/391293/)
One of the review suggestions was to make the hint optional
(discussed in the community call as well).

I'd appreciate any test results with the patches.

TODO

1. libvirt exploits for optional hint

page-cache-control
balloon-page-cache
provide-memory-hint-during-ballooning

---
 b/balloon.c                       |   18 +++-
 b/balloon.h                       |    4
 b/drivers/virtio/virtio_balloon.c |   17 +++
 b/hmp-commands.hx                 |    7 +
 b/hw/virtio-balloon.c             |   14 ++-
 b/hw/virtio-balloon.h             |    3
 b/include/linux/gfp.h             |    8 +
 b/include/linux/mmzone.h          |    2
 b/include/linux/swap.h            |    3
 b/include/linux/virtio_balloon.h  |    3
 b/mm/page_alloc.c                 |    9 +-
 b/mm/vmscan.c                     |  162 ++++++++++++++++++++++++++++----------
 b/qmp-commands.hx                 |    7 -
 include/linux/swap.h              |    9 --
 mm/page_alloc.c                   |    3
 mm/vmscan.c                       |    2
 16 files changed, 202 insertions(+), 69 deletions(-)


-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
  2010-10-28 22:40 ` [Qemu-devel] " Balbir Singh
@ 2010-10-28 22:40   ` Balbir Singh
  -1 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, Balbir Singh, qemu-devel

Selectively control Unmapped Page Cache (nospam version)

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario

- In a virtualized environment with cache=writethrough, we see
  double caching - (one in the host and one in the guest). As
  we try to scale guests, cache usage across the system grows.
  The goal of this patch is to reclaim page cache when Linux is running
  as a guest and get the host to hold the page cache and manage it.
  There might be temporary duplication, but in the long run, memory
  in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
  can selectively turn it on, on a need to use basis.

A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.

KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.

There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
and the number of pages to reclaim when unmapped_page_control argument
is supplied. These numbers were chosen to avoid aggressiveness in
reaping page cache ever so frequently, at the same time providing control.

The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim.

Host Usage without boot parameter (memory in KB)
----------------------------
MemFree Cached Time
19900   292912 137
17540   296196 139
17900   296124 141
19356   296660 141

Host usage:  (memory in KB)

RSS     Cache   mapped  swap
2788664 781884  3780    359536

Guest Usage with boot parameter (memory in KB)
-------------------------
Memfree Cached   Time
244824  74828   144
237840  81764   143
235880  83044   138
239312  80092   148

Host usage: (memory in KB)

RSS     Cache   mapped  swap
2700184 958012  334848  398412

TODOS
-----
1. Balance slab cache as well

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/mmzone.h |    2 -
 include/linux/swap.h   |    3 +
 mm/page_alloc.c        |    9 ++-
 mm/vmscan.c            |  162 ++++++++++++++++++++++++++++++++++++------------
 4 files changed, 132 insertions(+), 44 deletions(-)


diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3984c4e..a591a7a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -300,12 +300,12 @@ struct zone {
 	 */
 	unsigned long		lowmem_reserve[MAX_NR_ZONES];
 
+	unsigned long		min_unmapped_pages;
 #ifdef CONFIG_NUMA
 	int node;
 	/*
 	 * zone reclaim becomes active if more unmapped pages exist.
 	 */
-	unsigned long		min_unmapped_pages;
 	unsigned long		min_slab_pages;
 #endif
 	struct per_cpu_pageset __percpu *pageset;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7cdd633..5d29097 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -251,10 +251,11 @@ extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
+extern bool should_balance_unmapped_pages(struct zone *zone);
 
+extern int sysctl_min_unmapped_ratio;
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
-extern int sysctl_min_unmapped_ratio;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f12ad18..d8fe29f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1642,6 +1642,9 @@ zonelist_scan:
 			unsigned long mark;
 			int ret;
 
+			if (should_balance_unmapped_pages(zone))
+				wakeup_kswapd(zone, order);
+
 			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 			if (zone_watermark_ok(zone, order, mark,
 				    classzone_idx, alloc_flags))
@@ -4101,10 +4104,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
 						/ 100;
+#ifdef CONFIG_NUMA
+		zone->node = nid;
 		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
 #endif
 		zone->name = zone_names[j];
@@ -5012,7 +5015,6 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
-#ifdef CONFIG_NUMA
 int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -5029,6 +5031,7 @@ int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
 int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c5dfabf..02346ad 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -139,6 +139,18 @@ static DECLARE_RWSEM(shrinker_rwsem);
 #define scanning_global_lru(sc)	(1)
 #endif
 
+static int unmapped_page_control __read_mostly;
+
+static int __init unmapped_page_control_parm(char *str)
+{
+	unmapped_page_control = 1;
+	/*
+	 * XXX: Should we tweak swappiness here?
+	 */
+	return 1;
+}
+__setup("unmapped_page_control", unmapped_page_control_parm);
+
 static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
 						  struct scan_control *sc)
 {
@@ -2091,6 +2103,102 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
 }
 
 /*
+ * Percentage of pages in a zone that must be unmapped for zone_reclaim to
+ * occur.
+ */
+int sysctl_min_unmapped_ratio = 1;
+/*
+ * Priority for ZONE_RECLAIM. This determines the fraction of pages
+ * of a node considered for each zone_reclaim. 4 scans 1/16th of
+ * a zone.
+ */
+#define ZONE_RECLAIM_PRIORITY 4
+
+
+#define RECLAIM_OFF 0
+#define RECLAIM_ZONE (1<<0)	/* Run shrink_inactive_list on the zone */
+#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
+#define RECLAIM_SWAP (1<<2)	/* Swap pages out during reclaim */
+
+static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
+{
+	unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
+	unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
+		zone_page_state(zone, NR_ACTIVE_FILE);
+
+	/*
+	 * It's possible for there to be more file mapped pages than
+	 * accounted for by the pages on the file LRU lists because
+	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
+	 */
+	return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
+}
+
+/*
+ * Helper function to reclaim unmapped pages, we might add something
+ * similar to this for slab cache as well. Currently this function
+ * is shared with __zone_reclaim()
+ */
+static inline void
+zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
+				unsigned long nr_pages)
+{
+	int priority;
+	/*
+	 * Free memory by calling shrink zone with increasing
+	 * priorities until we have enough memory freed.
+	 */
+	priority = ZONE_RECLAIM_PRIORITY;
+	do {
+		shrink_zone(priority, zone, sc);
+		priority--;
+	} while (priority >= 0 && sc->nr_reclaimed < nr_pages);
+}
+
+/*
+ * Routine to balance unmapped pages, inspired from the code under
+ * CONFIG_NUMA that does unmapped page and slab page control by keeping
+ * min_unmapped_pages in the zone. We currently reclaim just unmapped
+ * pages, slab control will come in soon, at which point this routine
+ * should be called balance cached pages
+ */
+static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
+						struct scan_control *sc)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
+		struct scan_control nsc;
+		unsigned long nr_pages;
+
+		nsc = *sc;
+
+		nsc.swappiness = 0;
+		nsc.may_writepage = 0;
+		nsc.may_unmap = 0;
+		nsc.nr_reclaimed = 0;
+
+		nr_pages = zone_unmapped_file_pages(zone) -
+				zone->min_unmapped_pages;
+		/* Magically try to reclaim eighth the unmapped cache pages */
+		nr_pages >>= 3;
+
+		zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
+		return nsc.nr_reclaimed;
+	}
+	return 0;
+}
+
+#define UNMAPPED_PAGE_RATIO 16
+bool should_balance_unmapped_pages(struct zone *zone)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) >
+			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
+		return true;
+	return false;
+}
+
+/*
  * For kswapd, balance_pgdat() will work across all this node's zones until
  * they are all at high_wmark_pages(zone).
  *
@@ -2169,6 +2277,12 @@ loop_again:
 				shrink_active_list(SWAP_CLUSTER_MAX, zone,
 							&sc, priority, 0);
 
+			/*
+			 * We do unmapped page balancing once here and once
+			 * below, so that we don't lose out
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
+
 			if (!zone_watermark_ok(zone, order,
 					high_wmark_pages(zone), 0, 0)) {
 				end_zone = i;
@@ -2206,6 +2320,12 @@ loop_again:
 			sc.nr_scanned = 0;
 
 			/*
+			 * Balance unmapped pages upfront, this should be
+			 * really cheap
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
+
+			/*
 			 * Call soft limit reclaim before calling shrink_zone.
 			 * For now we ignore the return value
 			 */
@@ -2418,7 +2538,8 @@ void wakeup_kswapd(struct zone *zone, int order)
 		return;
 
 	pgdat = zone->zone_pgdat;
-	if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
+	if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0) &&
+		!should_balance_unmapped_pages(zone))
 		return;
 	if (pgdat->kswapd_max_order < order)
 		pgdat->kswapd_max_order = order;
@@ -2585,44 +2706,12 @@ module_init(kswapd_init)
  */
 int zone_reclaim_mode __read_mostly;
 
-#define RECLAIM_OFF 0
-#define RECLAIM_ZONE (1<<0)	/* Run shrink_inactive_list on the zone */
-#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
-#define RECLAIM_SWAP (1<<2)	/* Swap pages out during reclaim */
-
-/*
- * Priority for ZONE_RECLAIM. This determines the fraction of pages
- * of a node considered for each zone_reclaim. 4 scans 1/16th of
- * a zone.
- */
-#define ZONE_RECLAIM_PRIORITY 4
-
-/*
- * Percentage of pages in a zone that must be unmapped for zone_reclaim to
- * occur.
- */
-int sysctl_min_unmapped_ratio = 1;
-
 /*
  * If the number of slab pages in a zone grows beyond this percentage then
  * slab reclaim needs to occur.
  */
 int sysctl_min_slab_ratio = 5;
 
-static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
-{
-	unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
-	unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
-		zone_page_state(zone, NR_ACTIVE_FILE);
-
-	/*
-	 * It's possible for there to be more file mapped pages than
-	 * accounted for by the pages on the file LRU lists because
-	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
-	 */
-	return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
-}
-
 /* Work out how many page cache pages we can reclaim in this reclaim_mode */
 static long zone_pagecache_reclaimable(struct zone *zone)
 {
@@ -2660,7 +2749,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	const unsigned long nr_pages = 1 << order;
 	struct task_struct *p = current;
 	struct reclaim_state reclaim_state;
-	int priority;
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2689,11 +2777,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * Free memory by calling shrink zone with increasing
 		 * priorities until we have enough memory freed.
 		 */
-		priority = ZONE_RECLAIM_PRIORITY;
-		do {
-			shrink_zone(priority, zone, &sc);
-			priority--;
-		} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
+		zone_reclaim_unmapped_pages(zone, &sc, nr_pages);
 	}
 
 	nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
@ 2010-10-28 22:40   ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, qemu-devel, Balbir Singh

Selectively control Unmapped Page Cache (nospam version)

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario

- In a virtualized environment with cache=writethrough, we see
  double caching - (one in the host and one in the guest). As
  we try to scale guests, cache usage across the system grows.
  The goal of this patch is to reclaim page cache when Linux is running
  as a guest and get the host to hold the page cache and manage it.
  There might be temporary duplication, but in the long run, memory
  in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
  can selectively turn it on, on a need to use basis.

A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.

KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.

There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
and the number of pages to reclaim when unmapped_page_control argument
is supplied. These numbers were chosen to avoid aggressiveness in
reaping page cache ever so frequently, at the same time providing control.

The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim.

Host Usage without boot parameter (memory in KB)
----------------------------
MemFree Cached Time
19900   292912 137
17540   296196 139
17900   296124 141
19356   296660 141

Host usage:  (memory in KB)

RSS     Cache   mapped  swap
2788664 781884  3780    359536

Guest Usage with boot parameter (memory in KB)
-------------------------
Memfree Cached   Time
244824  74828   144
237840  81764   143
235880  83044   138
239312  80092   148

Host usage: (memory in KB)

RSS     Cache   mapped  swap
2700184 958012  334848  398412

TODOS
-----
1. Balance slab cache as well

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/mmzone.h |    2 -
 include/linux/swap.h   |    3 +
 mm/page_alloc.c        |    9 ++-
 mm/vmscan.c            |  162 ++++++++++++++++++++++++++++++++++++------------
 4 files changed, 132 insertions(+), 44 deletions(-)


diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3984c4e..a591a7a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -300,12 +300,12 @@ struct zone {
 	 */
 	unsigned long		lowmem_reserve[MAX_NR_ZONES];
 
+	unsigned long		min_unmapped_pages;
 #ifdef CONFIG_NUMA
 	int node;
 	/*
 	 * zone reclaim becomes active if more unmapped pages exist.
 	 */
-	unsigned long		min_unmapped_pages;
 	unsigned long		min_slab_pages;
 #endif
 	struct per_cpu_pageset __percpu *pageset;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7cdd633..5d29097 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -251,10 +251,11 @@ extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
+extern bool should_balance_unmapped_pages(struct zone *zone);
 
+extern int sysctl_min_unmapped_ratio;
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
-extern int sysctl_min_unmapped_ratio;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f12ad18..d8fe29f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1642,6 +1642,9 @@ zonelist_scan:
 			unsigned long mark;
 			int ret;
 
+			if (should_balance_unmapped_pages(zone))
+				wakeup_kswapd(zone, order);
+
 			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 			if (zone_watermark_ok(zone, order, mark,
 				    classzone_idx, alloc_flags))
@@ -4101,10 +4104,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
 						/ 100;
+#ifdef CONFIG_NUMA
+		zone->node = nid;
 		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
 #endif
 		zone->name = zone_names[j];
@@ -5012,7 +5015,6 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
-#ifdef CONFIG_NUMA
 int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -5029,6 +5031,7 @@ int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
 int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c5dfabf..02346ad 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -139,6 +139,18 @@ static DECLARE_RWSEM(shrinker_rwsem);
 #define scanning_global_lru(sc)	(1)
 #endif
 
+static int unmapped_page_control __read_mostly;
+
+static int __init unmapped_page_control_parm(char *str)
+{
+	unmapped_page_control = 1;
+	/*
+	 * XXX: Should we tweak swappiness here?
+	 */
+	return 1;
+}
+__setup("unmapped_page_control", unmapped_page_control_parm);
+
 static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
 						  struct scan_control *sc)
 {
@@ -2091,6 +2103,102 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
 }
 
 /*
+ * Percentage of pages in a zone that must be unmapped for zone_reclaim to
+ * occur.
+ */
+int sysctl_min_unmapped_ratio = 1;
+/*
+ * Priority for ZONE_RECLAIM. This determines the fraction of pages
+ * of a node considered for each zone_reclaim. 4 scans 1/16th of
+ * a zone.
+ */
+#define ZONE_RECLAIM_PRIORITY 4
+
+
+#define RECLAIM_OFF 0
+#define RECLAIM_ZONE (1<<0)	/* Run shrink_inactive_list on the zone */
+#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
+#define RECLAIM_SWAP (1<<2)	/* Swap pages out during reclaim */
+
+static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
+{
+	unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
+	unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
+		zone_page_state(zone, NR_ACTIVE_FILE);
+
+	/*
+	 * It's possible for there to be more file mapped pages than
+	 * accounted for by the pages on the file LRU lists because
+	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
+	 */
+	return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
+}
+
+/*
+ * Helper function to reclaim unmapped pages, we might add something
+ * similar to this for slab cache as well. Currently this function
+ * is shared with __zone_reclaim()
+ */
+static inline void
+zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
+				unsigned long nr_pages)
+{
+	int priority;
+	/*
+	 * Free memory by calling shrink zone with increasing
+	 * priorities until we have enough memory freed.
+	 */
+	priority = ZONE_RECLAIM_PRIORITY;
+	do {
+		shrink_zone(priority, zone, sc);
+		priority--;
+	} while (priority >= 0 && sc->nr_reclaimed < nr_pages);
+}
+
+/*
+ * Routine to balance unmapped pages, inspired from the code under
+ * CONFIG_NUMA that does unmapped page and slab page control by keeping
+ * min_unmapped_pages in the zone. We currently reclaim just unmapped
+ * pages, slab control will come in soon, at which point this routine
+ * should be called balance cached pages
+ */
+static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
+						struct scan_control *sc)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
+		struct scan_control nsc;
+		unsigned long nr_pages;
+
+		nsc = *sc;
+
+		nsc.swappiness = 0;
+		nsc.may_writepage = 0;
+		nsc.may_unmap = 0;
+		nsc.nr_reclaimed = 0;
+
+		nr_pages = zone_unmapped_file_pages(zone) -
+				zone->min_unmapped_pages;
+		/* Magically try to reclaim eighth the unmapped cache pages */
+		nr_pages >>= 3;
+
+		zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
+		return nsc.nr_reclaimed;
+	}
+	return 0;
+}
+
+#define UNMAPPED_PAGE_RATIO 16
+bool should_balance_unmapped_pages(struct zone *zone)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) >
+			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
+		return true;
+	return false;
+}
+
+/*
  * For kswapd, balance_pgdat() will work across all this node's zones until
  * they are all at high_wmark_pages(zone).
  *
@@ -2169,6 +2277,12 @@ loop_again:
 				shrink_active_list(SWAP_CLUSTER_MAX, zone,
 							&sc, priority, 0);
 
+			/*
+			 * We do unmapped page balancing once here and once
+			 * below, so that we don't lose out
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
+
 			if (!zone_watermark_ok(zone, order,
 					high_wmark_pages(zone), 0, 0)) {
 				end_zone = i;
@@ -2206,6 +2320,12 @@ loop_again:
 			sc.nr_scanned = 0;
 
 			/*
+			 * Balance unmapped pages upfront, this should be
+			 * really cheap
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
+
+			/*
 			 * Call soft limit reclaim before calling shrink_zone.
 			 * For now we ignore the return value
 			 */
@@ -2418,7 +2538,8 @@ void wakeup_kswapd(struct zone *zone, int order)
 		return;
 
 	pgdat = zone->zone_pgdat;
-	if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
+	if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0) &&
+		!should_balance_unmapped_pages(zone))
 		return;
 	if (pgdat->kswapd_max_order < order)
 		pgdat->kswapd_max_order = order;
@@ -2585,44 +2706,12 @@ module_init(kswapd_init)
  */
 int zone_reclaim_mode __read_mostly;
 
-#define RECLAIM_OFF 0
-#define RECLAIM_ZONE (1<<0)	/* Run shrink_inactive_list on the zone */
-#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
-#define RECLAIM_SWAP (1<<2)	/* Swap pages out during reclaim */
-
-/*
- * Priority for ZONE_RECLAIM. This determines the fraction of pages
- * of a node considered for each zone_reclaim. 4 scans 1/16th of
- * a zone.
- */
-#define ZONE_RECLAIM_PRIORITY 4
-
-/*
- * Percentage of pages in a zone that must be unmapped for zone_reclaim to
- * occur.
- */
-int sysctl_min_unmapped_ratio = 1;
-
 /*
  * If the number of slab pages in a zone grows beyond this percentage then
  * slab reclaim needs to occur.
  */
 int sysctl_min_slab_ratio = 5;
 
-static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
-{
-	unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
-	unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
-		zone_page_state(zone, NR_ACTIVE_FILE);
-
-	/*
-	 * It's possible for there to be more file mapped pages than
-	 * accounted for by the pages on the file LRU lists because
-	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
-	 */
-	return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
-}
-
 /* Work out how many page cache pages we can reclaim in this reclaim_mode */
 static long zone_pagecache_reclaimable(struct zone *zone)
 {
@@ -2660,7 +2749,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	const unsigned long nr_pages = 1 << order;
 	struct task_struct *p = current;
 	struct reclaim_state reclaim_state;
-	int priority;
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2689,11 +2777,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * Free memory by calling shrink zone with increasing
 		 * priorities until we have enough memory freed.
 		 */
-		priority = ZONE_RECLAIM_PRIORITY;
-		do {
-			shrink_zone(priority, zone, &sc);
-			priority--;
-		} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
+		zone_reclaim_unmapped_pages(zone, &sc, nr_pages);
 	}
 
 	nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);

-- 
	Three Cheers,
	Balbir

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC][PATCH 2/3] Linux/Guest cooperative unmapped page cache control
  2010-10-28 22:40 ` [Qemu-devel] " Balbir Singh
  (?)
@ 2010-10-28 22:40   ` Balbir Singh
  -1 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, Balbir Singh, qemu-devel

Balloon unmapped page cache pages first

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch builds on the ballooning infrastructure by ballooning unmapped
page cache pages first. It looks for low hanging fruit first and tries
to reclaim clean unmapped pages first.

This patch brings zone_reclaim() and other dependencies out of CONFIG_NUMA
and then reuses the zone_reclaim_mode logic if __GFP_FREE_CACHE is passed
in the gfp_mask. The virtio balloon driver has been changed to use
__GFP_FREE_CACHE. During fill_balloon(), the driver looks for hints
provided by the hypervisor to reclaim cached memory. By default the hint
is off and can be turned on by passing an argument that specifies that
we intend to reclaim cached memory.

Tests:

Test 1
------
I ran a simple filter function that kept frequently ballon a single VM
running kernbench. The VM was configured with 2GB of memory and 2 VCPUs.
The filter function was a triangular wave function that ballooned
the VM under study from 500MB to 1500MB using a triangular wave function
continously. The run times of the VM with and without changes are shown
below. The run times showed no significant impact of the changes.

Withchanges

Elapsed Time 223.86 (1.52822)
User Time 191.01 (0.65395)
System Time 199.468 (2.43616)
Percent CPU 174 (1)
Context Switches 103182 (595.05)
Sleeps 39107.6 (1505.67)

Without changes

Elapsed Time 225.526 (2.93102)
User Time 193.53 (3.53626)
System Time 199.832 (3.26281)
Percent CPU 173.6 (1.14018)
Context Switches 103744 (1311.53)
Sleeps 39383.2 (831.865)

The key advantage was that it resulted in lesser RSS usage in the host and
more cached usage, indicating that the caching had been pushed towards
the host. The guest cached memory usage was lower and free memory in
the guest was also higher.

Test 2
------
I ran kernbench under the memory overcommit manager (6 VM's with 2 vCPUs, 2GB)
with KSM and ksmtuned enabled. memory overcommit manager details are at
http://github.com/aglitke/mom/wiki. The command line for kernbench was
kernbench -M.

The tests showed the following

Withchanges

Elapsed Time 842.936 (12.2247)
Elapsed Time 844.266 (25.8047)
Elapsed Time 844.696 (11.2433)
Elapsed Time 846.08 (14.0249)
Elapsed Time 838.58 (7.44609)
Elapsed Time 842.362 (4.37463)

Withoutchanges

Elapsed Time 837.604 (14.1311)
Elapsed Time 839.322 (17.1772)
Elapsed Time 843.744 (9.21541)
Elapsed Time 842.592 (7.48622)
Elapsed Time 844.272 (25.486)
Elapsed Time 838.858 (7.5044)

General observations

1. Free memory in each of guests was higher with changes.
   The additional free memory was of the order of 120MB per VM
2. Cached memory in each guest was lesser with changes
3. Host free memory was almost constant (independent of
   changes)
4. Host anonymous memory usage was lesser with the changes

The goal of this patch is to free up memory locked up in
duplicated cache contents and (1) above shows that we are
able to successfully free it up.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 drivers/virtio/virtio_balloon.c |   17 +++++++++++++++--
 include/linux/gfp.h             |    8 +++++++-
 include/linux/swap.h            |    9 +++------
 include/linux/virtio_balloon.h  |    3 +++
 mm/page_alloc.c                 |    3 ++-
 mm/vmscan.c                     |    2 +-
 6 files changed, 31 insertions(+), 11 deletions(-)


diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0f1da45..70f97ea 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -99,12 +99,24 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 
 static void fill_balloon(struct virtio_balloon *vb, size_t num)
 {
+	u32 reclaim_cache_first;
+	int err;
+	gfp_t mask = GFP_HIGHUSER | __GFP_NORETRY | __GFP_NOMEMALLOC |
+			__GFP_NOWARN;
+
+	err = virtio_config_val(vb->vdev, VIRTIO_BALLOON_F_BALLOON_HINT,
+				offsetof(struct virtio_balloon_config,
+						reclaim_cache_first),
+				&reclaim_cache_first);
+
+	if (!err && reclaim_cache_first)
+		mask |= __GFP_FREE_CACHE;
+
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
 
 	for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) {
-		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-					__GFP_NOMEMALLOC | __GFP_NOWARN);
+		struct page *page = alloc_page(mask);
 		if (!page) {
 			if (printk_ratelimit())
 				dev_printk(KERN_INFO, &vb->vdev->dev,
@@ -358,6 +370,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
 static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
+	VIRTIO_BALLOON_F_BALLOON_HINT,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 975609c..9048259 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -61,12 +61,18 @@ struct vm_area_struct;
 #endif
 
 /*
+ * While allocating pages, try to free cache pages first. Note the
+ * heavy dependency on zone_reclaim_mode logic
+ */
+#define __GFP_FREE_CACHE ((__force gfp_t)0x400000u) /* Free cache first */
+
+/*
  * This may seem redundant, but it's a way of annotating false positives vs.
  * allocations that simply cannot be supported (e.g. page tables).
  */
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 
-#define __GFP_BITS_SHIFT 22	/* Room for 22 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 23	/* Room for 22 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5d29097..e77db75 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -254,16 +254,13 @@ extern long vm_total_pages;
 extern bool should_balance_unmapped_pages(struct zone *zone);
 
 extern int sysctl_min_unmapped_ratio;
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
 #else
 #define zone_reclaim_mode 0
-static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
-{
-	return 0;
-}
 #endif
 
 extern int page_evictable(struct page *page, struct vm_area_struct *vma);
diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
index a50ecd1..6e405b4 100644
--- a/include/linux/virtio_balloon.h
+++ b/include/linux/virtio_balloon.h
@@ -8,6 +8,7 @@
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
+#define VIRTIO_BALLOON_F_BALLOON_HINT	2 /* Reclaim hint */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -18,6 +19,8 @@ struct virtio_balloon_config
 	__le32 num_pages;
 	/* Number of pages we've actually got in balloon. */
 	__le32 actual;
+	/* Hint, should we reclaim cached pages first? */
+	__le32 reclaim_cache_first;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8fe29f..2cdf4a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1650,7 +1650,8 @@ zonelist_scan:
 				    classzone_idx, alloc_flags))
 				goto try_this_zone;
 
-			if (zone_reclaim_mode == 0)
+			if (zone_reclaim_mode == 0 &&
+				!(gfp_mask & __GFP_FREE_CACHE))
 				goto this_zone_full;
 
 			ret = zone_reclaim(zone, gfp_mask, order);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 02346ad..9a11e5a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2705,6 +2705,7 @@ module_init(kswapd_init)
  * the watermarks.
  */
 int zone_reclaim_mode __read_mostly;
+#endif
 
 /*
  * If the number of slab pages in a zone grows beyond this percentage then
@@ -2870,7 +2871,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
-#endif
 
 /*
  * page_evictable - test whether a page is evictable

-- 
	Three Cheers,
	Balbir

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC][PATCH 2/3] Linux/Guest cooperative unmapped page cache control
@ 2010-10-28 22:40   ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, Balbir Singh, qemu-devel

Balloon unmapped page cache pages first

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch builds on the ballooning infrastructure by ballooning unmapped
page cache pages first. It looks for low hanging fruit first and tries
to reclaim clean unmapped pages first.

This patch brings zone_reclaim() and other dependencies out of CONFIG_NUMA
and then reuses the zone_reclaim_mode logic if __GFP_FREE_CACHE is passed
in the gfp_mask. The virtio balloon driver has been changed to use
__GFP_FREE_CACHE. During fill_balloon(), the driver looks for hints
provided by the hypervisor to reclaim cached memory. By default the hint
is off and can be turned on by passing an argument that specifies that
we intend to reclaim cached memory.

Tests:

Test 1
------
I ran a simple filter function that kept frequently ballon a single VM
running kernbench. The VM was configured with 2GB of memory and 2 VCPUs.
The filter function was a triangular wave function that ballooned
the VM under study from 500MB to 1500MB using a triangular wave function
continously. The run times of the VM with and without changes are shown
below. The run times showed no significant impact of the changes.

Withchanges

Elapsed Time 223.86 (1.52822)
User Time 191.01 (0.65395)
System Time 199.468 (2.43616)
Percent CPU 174 (1)
Context Switches 103182 (595.05)
Sleeps 39107.6 (1505.67)

Without changes

Elapsed Time 225.526 (2.93102)
User Time 193.53 (3.53626)
System Time 199.832 (3.26281)
Percent CPU 173.6 (1.14018)
Context Switches 103744 (1311.53)
Sleeps 39383.2 (831.865)

The key advantage was that it resulted in lesser RSS usage in the host and
more cached usage, indicating that the caching had been pushed towards
the host. The guest cached memory usage was lower and free memory in
the guest was also higher.

Test 2
------
I ran kernbench under the memory overcommit manager (6 VM's with 2 vCPUs, 2GB)
with KSM and ksmtuned enabled. memory overcommit manager details are at
http://github.com/aglitke/mom/wiki. The command line for kernbench was
kernbench -M.

The tests showed the following

Withchanges

Elapsed Time 842.936 (12.2247)
Elapsed Time 844.266 (25.8047)
Elapsed Time 844.696 (11.2433)
Elapsed Time 846.08 (14.0249)
Elapsed Time 838.58 (7.44609)
Elapsed Time 842.362 (4.37463)

Withoutchanges

Elapsed Time 837.604 (14.1311)
Elapsed Time 839.322 (17.1772)
Elapsed Time 843.744 (9.21541)
Elapsed Time 842.592 (7.48622)
Elapsed Time 844.272 (25.486)
Elapsed Time 838.858 (7.5044)

General observations

1. Free memory in each of guests was higher with changes.
   The additional free memory was of the order of 120MB per VM
2. Cached memory in each guest was lesser with changes
3. Host free memory was almost constant (independent of
   changes)
4. Host anonymous memory usage was lesser with the changes

The goal of this patch is to free up memory locked up in
duplicated cache contents and (1) above shows that we are
able to successfully free it up.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 drivers/virtio/virtio_balloon.c |   17 +++++++++++++++--
 include/linux/gfp.h             |    8 +++++++-
 include/linux/swap.h            |    9 +++------
 include/linux/virtio_balloon.h  |    3 +++
 mm/page_alloc.c                 |    3 ++-
 mm/vmscan.c                     |    2 +-
 6 files changed, 31 insertions(+), 11 deletions(-)


diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0f1da45..70f97ea 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -99,12 +99,24 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 
 static void fill_balloon(struct virtio_balloon *vb, size_t num)
 {
+	u32 reclaim_cache_first;
+	int err;
+	gfp_t mask = GFP_HIGHUSER | __GFP_NORETRY | __GFP_NOMEMALLOC |
+			__GFP_NOWARN;
+
+	err = virtio_config_val(vb->vdev, VIRTIO_BALLOON_F_BALLOON_HINT,
+				offsetof(struct virtio_balloon_config,
+						reclaim_cache_first),
+				&reclaim_cache_first);
+
+	if (!err && reclaim_cache_first)
+		mask |= __GFP_FREE_CACHE;
+
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
 
 	for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) {
-		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-					__GFP_NOMEMALLOC | __GFP_NOWARN);
+		struct page *page = alloc_page(mask);
 		if (!page) {
 			if (printk_ratelimit())
 				dev_printk(KERN_INFO, &vb->vdev->dev,
@@ -358,6 +370,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
 static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
+	VIRTIO_BALLOON_F_BALLOON_HINT,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 975609c..9048259 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -61,12 +61,18 @@ struct vm_area_struct;
 #endif
 
 /*
+ * While allocating pages, try to free cache pages first. Note the
+ * heavy dependency on zone_reclaim_mode logic
+ */
+#define __GFP_FREE_CACHE ((__force gfp_t)0x400000u) /* Free cache first */
+
+/*
  * This may seem redundant, but it's a way of annotating false positives vs.
  * allocations that simply cannot be supported (e.g. page tables).
  */
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 
-#define __GFP_BITS_SHIFT 22	/* Room for 22 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 23	/* Room for 22 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5d29097..e77db75 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -254,16 +254,13 @@ extern long vm_total_pages;
 extern bool should_balance_unmapped_pages(struct zone *zone);
 
 extern int sysctl_min_unmapped_ratio;
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
 #else
 #define zone_reclaim_mode 0
-static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
-{
-	return 0;
-}
 #endif
 
 extern int page_evictable(struct page *page, struct vm_area_struct *vma);
diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
index a50ecd1..6e405b4 100644
--- a/include/linux/virtio_balloon.h
+++ b/include/linux/virtio_balloon.h
@@ -8,6 +8,7 @@
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
+#define VIRTIO_BALLOON_F_BALLOON_HINT	2 /* Reclaim hint */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -18,6 +19,8 @@ struct virtio_balloon_config
 	__le32 num_pages;
 	/* Number of pages we've actually got in balloon. */
 	__le32 actual;
+	/* Hint, should we reclaim cached pages first? */
+	__le32 reclaim_cache_first;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8fe29f..2cdf4a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1650,7 +1650,8 @@ zonelist_scan:
 				    classzone_idx, alloc_flags))
 				goto try_this_zone;
 
-			if (zone_reclaim_mode == 0)
+			if (zone_reclaim_mode == 0 &&
+				!(gfp_mask & __GFP_FREE_CACHE))
 				goto this_zone_full;
 
 			ret = zone_reclaim(zone, gfp_mask, order);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 02346ad..9a11e5a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2705,6 +2705,7 @@ module_init(kswapd_init)
  * the watermarks.
  */
 int zone_reclaim_mode __read_mostly;
+#endif
 
 /*
  * If the number of slab pages in a zone grows beyond this percentage then
@@ -2870,7 +2871,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
-#endif
 
 /*
  * page_evictable - test whether a page is evictable

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC][PATCH 2/3] Linux/Guest cooperative unmapped page cache control
@ 2010-10-28 22:40   ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, qemu-devel, Balbir Singh

Balloon unmapped page cache pages first

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch builds on the ballooning infrastructure by ballooning unmapped
page cache pages first. It looks for low hanging fruit first and tries
to reclaim clean unmapped pages first.

This patch brings zone_reclaim() and other dependencies out of CONFIG_NUMA
and then reuses the zone_reclaim_mode logic if __GFP_FREE_CACHE is passed
in the gfp_mask. The virtio balloon driver has been changed to use
__GFP_FREE_CACHE. During fill_balloon(), the driver looks for hints
provided by the hypervisor to reclaim cached memory. By default the hint
is off and can be turned on by passing an argument that specifies that
we intend to reclaim cached memory.

Tests:

Test 1
------
I ran a simple filter function that kept frequently ballon a single VM
running kernbench. The VM was configured with 2GB of memory and 2 VCPUs.
The filter function was a triangular wave function that ballooned
the VM under study from 500MB to 1500MB using a triangular wave function
continously. The run times of the VM with and without changes are shown
below. The run times showed no significant impact of the changes.

Withchanges

Elapsed Time 223.86 (1.52822)
User Time 191.01 (0.65395)
System Time 199.468 (2.43616)
Percent CPU 174 (1)
Context Switches 103182 (595.05)
Sleeps 39107.6 (1505.67)

Without changes

Elapsed Time 225.526 (2.93102)
User Time 193.53 (3.53626)
System Time 199.832 (3.26281)
Percent CPU 173.6 (1.14018)
Context Switches 103744 (1311.53)
Sleeps 39383.2 (831.865)

The key advantage was that it resulted in lesser RSS usage in the host and
more cached usage, indicating that the caching had been pushed towards
the host. The guest cached memory usage was lower and free memory in
the guest was also higher.

Test 2
------
I ran kernbench under the memory overcommit manager (6 VM's with 2 vCPUs, 2GB)
with KSM and ksmtuned enabled. memory overcommit manager details are at
http://github.com/aglitke/mom/wiki. The command line for kernbench was
kernbench -M.

The tests showed the following

Withchanges

Elapsed Time 842.936 (12.2247)
Elapsed Time 844.266 (25.8047)
Elapsed Time 844.696 (11.2433)
Elapsed Time 846.08 (14.0249)
Elapsed Time 838.58 (7.44609)
Elapsed Time 842.362 (4.37463)

Withoutchanges

Elapsed Time 837.604 (14.1311)
Elapsed Time 839.322 (17.1772)
Elapsed Time 843.744 (9.21541)
Elapsed Time 842.592 (7.48622)
Elapsed Time 844.272 (25.486)
Elapsed Time 838.858 (7.5044)

General observations

1. Free memory in each of guests was higher with changes.
   The additional free memory was of the order of 120MB per VM
2. Cached memory in each guest was lesser with changes
3. Host free memory was almost constant (independent of
   changes)
4. Host anonymous memory usage was lesser with the changes

The goal of this patch is to free up memory locked up in
duplicated cache contents and (1) above shows that we are
able to successfully free it up.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 drivers/virtio/virtio_balloon.c |   17 +++++++++++++++--
 include/linux/gfp.h             |    8 +++++++-
 include/linux/swap.h            |    9 +++------
 include/linux/virtio_balloon.h  |    3 +++
 mm/page_alloc.c                 |    3 ++-
 mm/vmscan.c                     |    2 +-
 6 files changed, 31 insertions(+), 11 deletions(-)


diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0f1da45..70f97ea 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -99,12 +99,24 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 
 static void fill_balloon(struct virtio_balloon *vb, size_t num)
 {
+	u32 reclaim_cache_first;
+	int err;
+	gfp_t mask = GFP_HIGHUSER | __GFP_NORETRY | __GFP_NOMEMALLOC |
+			__GFP_NOWARN;
+
+	err = virtio_config_val(vb->vdev, VIRTIO_BALLOON_F_BALLOON_HINT,
+				offsetof(struct virtio_balloon_config,
+						reclaim_cache_first),
+				&reclaim_cache_first);
+
+	if (!err && reclaim_cache_first)
+		mask |= __GFP_FREE_CACHE;
+
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
 
 	for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) {
-		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-					__GFP_NOMEMALLOC | __GFP_NOWARN);
+		struct page *page = alloc_page(mask);
 		if (!page) {
 			if (printk_ratelimit())
 				dev_printk(KERN_INFO, &vb->vdev->dev,
@@ -358,6 +370,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
 static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
+	VIRTIO_BALLOON_F_BALLOON_HINT,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 975609c..9048259 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -61,12 +61,18 @@ struct vm_area_struct;
 #endif
 
 /*
+ * While allocating pages, try to free cache pages first. Note the
+ * heavy dependency on zone_reclaim_mode logic
+ */
+#define __GFP_FREE_CACHE ((__force gfp_t)0x400000u) /* Free cache first */
+
+/*
  * This may seem redundant, but it's a way of annotating false positives vs.
  * allocations that simply cannot be supported (e.g. page tables).
  */
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 
-#define __GFP_BITS_SHIFT 22	/* Room for 22 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 23	/* Room for 22 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5d29097..e77db75 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -254,16 +254,13 @@ extern long vm_total_pages;
 extern bool should_balance_unmapped_pages(struct zone *zone);
 
 extern int sysctl_min_unmapped_ratio;
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
 #else
 #define zone_reclaim_mode 0
-static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
-{
-	return 0;
-}
 #endif
 
 extern int page_evictable(struct page *page, struct vm_area_struct *vma);
diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
index a50ecd1..6e405b4 100644
--- a/include/linux/virtio_balloon.h
+++ b/include/linux/virtio_balloon.h
@@ -8,6 +8,7 @@
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
+#define VIRTIO_BALLOON_F_BALLOON_HINT	2 /* Reclaim hint */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -18,6 +19,8 @@ struct virtio_balloon_config
 	__le32 num_pages;
 	/* Number of pages we've actually got in balloon. */
 	__le32 actual;
+	/* Hint, should we reclaim cached pages first? */
+	__le32 reclaim_cache_first;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8fe29f..2cdf4a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1650,7 +1650,8 @@ zonelist_scan:
 				    classzone_idx, alloc_flags))
 				goto try_this_zone;
 
-			if (zone_reclaim_mode == 0)
+			if (zone_reclaim_mode == 0 &&
+				!(gfp_mask & __GFP_FREE_CACHE))
 				goto this_zone_full;
 
 			ret = zone_reclaim(zone, gfp_mask, order);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 02346ad..9a11e5a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2705,6 +2705,7 @@ module_init(kswapd_init)
  * the watermarks.
  */
 int zone_reclaim_mode __read_mostly;
+#endif
 
 /*
  * If the number of slab pages in a zone grows beyond this percentage then
@@ -2870,7 +2871,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
-#endif
 
 /*
  * page_evictable - test whether a page is evictable

-- 
	Three Cheers,
	Balbir

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC][PATCH 3/3] QEmu changes to provide balloon hint
  2010-10-28 22:40 ` [Qemu-devel] " Balbir Singh
  (?)
@ 2010-10-28 22:40   ` Balbir Singh
  -1 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, Balbir Singh, qemu-devel

Provide memory hint during ballooning

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch adds an optional hint to the qemu monitor balloon
command. The hint tells the guest operating system to consider
a class of memory during reclaim. Currently the supported
hint is cached memory. The design is generic and can be extended
to provide other hints in the future if required.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 balloon.c           |   18 ++++++++++++++----
 balloon.h           |    4 +++-
 hmp-commands.hx     |    7 +++++--
 hw/virtio-balloon.c |   15 +++++++++++----
 hw/virtio-balloon.h |    3 +++
 qmp-commands.hx     |    7 ++++---
 6 files changed, 40 insertions(+), 14 deletions(-)


diff --git a/balloon.c b/balloon.c
index 0021fef..b2bdda5 100644
--- a/balloon.c
+++ b/balloon.c
@@ -41,11 +41,13 @@ void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque)
     qemu_balloon_event_opaque = opaque;
 }
 
-int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque)
+int qemu_balloon(ram_addr_t target, bool reclaim_cache_first,
+                 MonitorCompletion cb, void *opaque)
 {
     if (qemu_balloon_event) {
         trace_balloon_event(qemu_balloon_event_opaque, target);
-        qemu_balloon_event(qemu_balloon_event_opaque, target, cb, opaque);
+        qemu_balloon_event(qemu_balloon_event_opaque, target,
+                           reclaim_cache_first, cb, opaque);
         return 1;
     } else {
         return 0;
@@ -55,7 +57,7 @@ int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque)
 int qemu_balloon_status(MonitorCompletion cb, void *opaque)
 {
     if (qemu_balloon_event) {
-        qemu_balloon_event(qemu_balloon_event_opaque, 0, cb, opaque);
+        qemu_balloon_event(qemu_balloon_event_opaque, 0, 0, cb, opaque);
         return 1;
     } else {
         return 0;
@@ -131,13 +133,21 @@ int do_balloon(Monitor *mon, const QDict *params,
 	       MonitorCompletion cb, void *opaque)
 {
     int ret;
+    int val;
+    const char *cache_hint;
+    int reclaim_cache_first = 0;
 
     if (kvm_enabled() && !kvm_has_sync_mmu()) {
         qerror_report(QERR_KVM_MISSING_CAP, "synchronous MMU", "balloon");
         return -1;
     }
 
-    ret = qemu_balloon(qdict_get_int(params, "value"), cb, opaque);
+    val = qdict_get_int(params, "value");
+    cache_hint = qdict_get_try_str(params, "hint");
+    if (cache_hint)
+        reclaim_cache_first = 1;
+
+    ret = qemu_balloon(val, reclaim_cache_first, cb, opaque);
     if (ret == 0) {
         qerror_report(QERR_DEVICE_NOT_ACTIVE, "balloon");
         return -1;
diff --git a/balloon.h b/balloon.h
index d478e28..65d68c1 100644
--- a/balloon.h
+++ b/balloon.h
@@ -17,11 +17,13 @@
 #include "monitor.h"
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target,
+                                bool reclaim_cache_first,
                                 MonitorCompletion cb, void *cb_data);
 
 void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque);
 
-int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque);
+int qemu_balloon(ram_addr_t target, bool reclaim_cache_first,
+                 MonitorCompletion cb, void *opaque);
 
 int qemu_balloon_status(MonitorCompletion cb, void *opaque);
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 81999aa..80e42aa 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -925,8 +925,8 @@ ETEXI
 
     {
         .name       = "balloon",
-        .args_type  = "value:M",
-        .params     = "target",
+        .args_type  = "value:M,hint:s?",
+        .params     = "target [cache]",
         .help       = "request VM to change its memory allocation (in MB)",
         .user_print = monitor_user_noop,
         .mhandler.cmd_async = do_balloon,
@@ -937,6 +937,9 @@ STEXI
 @item balloon @var{value}
 @findex balloon
 Request VM to change its memory allocation to @var{value} (in MB).
+An optional @var{hint} can be specified to indicate if the guest
+should reclaim from the cached memory in the guest first. The
+@var{hint} may be ignored by the guest.
 ETEXI
 
     {
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index 8adddea..e363507 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -44,6 +44,7 @@ typedef struct VirtIOBalloon
     size_t stats_vq_offset;
     MonitorCompletion *stats_callback;
     void *stats_opaque_callback_data;
+    uint32_t reclaim_cache_first;
 } VirtIOBalloon;
 
 static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
@@ -181,8 +182,11 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
-
-    memcpy(config_data, &config, 8);
+    if (vdev->guest_features & (1 << VIRTIO_BALLOON_F_BALLOON_HINT)) {
+        config.reclaim_cache_first = cpu_to_le32(dev->reclaim_cache_first);
+        memcpy(config_data, &config, 12);
+    } else
+        memcpy(config_data, &config, 8);
 }
 
 static void virtio_balloon_set_config(VirtIODevice *vdev,
@@ -196,11 +200,13 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
 {
-    f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
+    f |= (1 << VIRTIO_BALLOON_F_STATS_VQ) |
+         (1 << VIRTIO_BALLOON_F_BALLOON_HINT);
     return f;
 }
 
 static void virtio_balloon_to_target(void *opaque, ram_addr_t target,
+                                     bool reclaim_cache_first,
                                      MonitorCompletion cb, void *cb_data)
 {
     VirtIOBalloon *dev = opaque;
@@ -210,6 +216,7 @@ static void virtio_balloon_to_target(void *opaque, ram_addr_t target,
 
     if (target) {
         dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT;
+        dev->reclaim_cache_first = reclaim_cache_first;
         virtio_notify_config(&dev->vdev);
     } else {
         /* For now, only allow one request at a time.  This restriction can be
@@ -263,7 +270,7 @@ VirtIODevice *virtio_balloon_init(DeviceState *dev)
 
     s = (VirtIOBalloon *)virtio_common_init("virtio-balloon",
                                             VIRTIO_ID_BALLOON,
-                                            8, sizeof(VirtIOBalloon));
+                                            12, sizeof(VirtIOBalloon));
 
     s->vdev.get_config = virtio_balloon_get_config;
     s->vdev.set_config = virtio_balloon_set_config;
diff --git a/hw/virtio-balloon.h b/hw/virtio-balloon.h
index e20cf6b..39d1b01 100644
--- a/hw/virtio-balloon.h
+++ b/hw/virtio-balloon.h
@@ -26,6 +26,7 @@
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ 1       /* Memory stats virtqueue */
+#define VIRTIO_BALLOON_F_BALLOON_HINT 2   /* Ballon hint */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -36,6 +37,8 @@ struct virtio_balloon_config
     uint32_t num_pages;
     /* Number of pages we've actually got in balloon. */
     uint32_t actual;
+    /* Hint, should we reclaim cached pages first? */
+    uint32_t reclaim_cache_first;
 };
 
 /* Memory Statistics */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..1da2e65 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -605,8 +605,8 @@ EQMP
 
     {
         .name       = "balloon",
-        .args_type  = "value:M",
-        .params     = "target",
+        .args_type  = "value:M,hint:s?",
+        .params     = "target [cache]",
         .help       = "request VM to change its memory allocation (in MB)",
         .user_print = monitor_user_noop,
         .mhandler.cmd_async = do_balloon,
@@ -622,10 +622,11 @@ Request VM to change its memory allocation (in bytes).
 Arguments:
 
 - "value": New memory allocation (json-int)
+- "hint": Optional hint (json-string, optional)
 
 Example:
 
--> { "execute": "balloon", "arguments": { "value": 536870912 } }
++-> { "execute": "balloon", "arguments": { "value": 536870912, "hint":"cache" } }
 <- { "return": {} }
 
 EQMP

-- 
	Three Cheers,
	Balbir

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC][PATCH 3/3] QEmu changes to provide balloon hint
@ 2010-10-28 22:40   ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, Balbir Singh, qemu-devel

Provide memory hint during ballooning

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch adds an optional hint to the qemu monitor balloon
command. The hint tells the guest operating system to consider
a class of memory during reclaim. Currently the supported
hint is cached memory. The design is generic and can be extended
to provide other hints in the future if required.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 balloon.c           |   18 ++++++++++++++----
 balloon.h           |    4 +++-
 hmp-commands.hx     |    7 +++++--
 hw/virtio-balloon.c |   15 +++++++++++----
 hw/virtio-balloon.h |    3 +++
 qmp-commands.hx     |    7 ++++---
 6 files changed, 40 insertions(+), 14 deletions(-)


diff --git a/balloon.c b/balloon.c
index 0021fef..b2bdda5 100644
--- a/balloon.c
+++ b/balloon.c
@@ -41,11 +41,13 @@ void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque)
     qemu_balloon_event_opaque = opaque;
 }
 
-int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque)
+int qemu_balloon(ram_addr_t target, bool reclaim_cache_first,
+                 MonitorCompletion cb, void *opaque)
 {
     if (qemu_balloon_event) {
         trace_balloon_event(qemu_balloon_event_opaque, target);
-        qemu_balloon_event(qemu_balloon_event_opaque, target, cb, opaque);
+        qemu_balloon_event(qemu_balloon_event_opaque, target,
+                           reclaim_cache_first, cb, opaque);
         return 1;
     } else {
         return 0;
@@ -55,7 +57,7 @@ int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque)
 int qemu_balloon_status(MonitorCompletion cb, void *opaque)
 {
     if (qemu_balloon_event) {
-        qemu_balloon_event(qemu_balloon_event_opaque, 0, cb, opaque);
+        qemu_balloon_event(qemu_balloon_event_opaque, 0, 0, cb, opaque);
         return 1;
     } else {
         return 0;
@@ -131,13 +133,21 @@ int do_balloon(Monitor *mon, const QDict *params,
 	       MonitorCompletion cb, void *opaque)
 {
     int ret;
+    int val;
+    const char *cache_hint;
+    int reclaim_cache_first = 0;
 
     if (kvm_enabled() && !kvm_has_sync_mmu()) {
         qerror_report(QERR_KVM_MISSING_CAP, "synchronous MMU", "balloon");
         return -1;
     }
 
-    ret = qemu_balloon(qdict_get_int(params, "value"), cb, opaque);
+    val = qdict_get_int(params, "value");
+    cache_hint = qdict_get_try_str(params, "hint");
+    if (cache_hint)
+        reclaim_cache_first = 1;
+
+    ret = qemu_balloon(val, reclaim_cache_first, cb, opaque);
     if (ret == 0) {
         qerror_report(QERR_DEVICE_NOT_ACTIVE, "balloon");
         return -1;
diff --git a/balloon.h b/balloon.h
index d478e28..65d68c1 100644
--- a/balloon.h
+++ b/balloon.h
@@ -17,11 +17,13 @@
 #include "monitor.h"
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target,
+                                bool reclaim_cache_first,
                                 MonitorCompletion cb, void *cb_data);
 
 void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque);
 
-int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque);
+int qemu_balloon(ram_addr_t target, bool reclaim_cache_first,
+                 MonitorCompletion cb, void *opaque);
 
 int qemu_balloon_status(MonitorCompletion cb, void *opaque);
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 81999aa..80e42aa 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -925,8 +925,8 @@ ETEXI
 
     {
         .name       = "balloon",
-        .args_type  = "value:M",
-        .params     = "target",
+        .args_type  = "value:M,hint:s?",
+        .params     = "target [cache]",
         .help       = "request VM to change its memory allocation (in MB)",
         .user_print = monitor_user_noop,
         .mhandler.cmd_async = do_balloon,
@@ -937,6 +937,9 @@ STEXI
 @item balloon @var{value}
 @findex balloon
 Request VM to change its memory allocation to @var{value} (in MB).
+An optional @var{hint} can be specified to indicate if the guest
+should reclaim from the cached memory in the guest first. The
+@var{hint} may be ignored by the guest.
 ETEXI
 
     {
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index 8adddea..e363507 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -44,6 +44,7 @@ typedef struct VirtIOBalloon
     size_t stats_vq_offset;
     MonitorCompletion *stats_callback;
     void *stats_opaque_callback_data;
+    uint32_t reclaim_cache_first;
 } VirtIOBalloon;
 
 static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
@@ -181,8 +182,11 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
-
-    memcpy(config_data, &config, 8);
+    if (vdev->guest_features & (1 << VIRTIO_BALLOON_F_BALLOON_HINT)) {
+        config.reclaim_cache_first = cpu_to_le32(dev->reclaim_cache_first);
+        memcpy(config_data, &config, 12);
+    } else
+        memcpy(config_data, &config, 8);
 }
 
 static void virtio_balloon_set_config(VirtIODevice *vdev,
@@ -196,11 +200,13 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
 {
-    f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
+    f |= (1 << VIRTIO_BALLOON_F_STATS_VQ) |
+         (1 << VIRTIO_BALLOON_F_BALLOON_HINT);
     return f;
 }
 
 static void virtio_balloon_to_target(void *opaque, ram_addr_t target,
+                                     bool reclaim_cache_first,
                                      MonitorCompletion cb, void *cb_data)
 {
     VirtIOBalloon *dev = opaque;
@@ -210,6 +216,7 @@ static void virtio_balloon_to_target(void *opaque, ram_addr_t target,
 
     if (target) {
         dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT;
+        dev->reclaim_cache_first = reclaim_cache_first;
         virtio_notify_config(&dev->vdev);
     } else {
         /* For now, only allow one request at a time.  This restriction can be
@@ -263,7 +270,7 @@ VirtIODevice *virtio_balloon_init(DeviceState *dev)
 
     s = (VirtIOBalloon *)virtio_common_init("virtio-balloon",
                                             VIRTIO_ID_BALLOON,
-                                            8, sizeof(VirtIOBalloon));
+                                            12, sizeof(VirtIOBalloon));
 
     s->vdev.get_config = virtio_balloon_get_config;
     s->vdev.set_config = virtio_balloon_set_config;
diff --git a/hw/virtio-balloon.h b/hw/virtio-balloon.h
index e20cf6b..39d1b01 100644
--- a/hw/virtio-balloon.h
+++ b/hw/virtio-balloon.h
@@ -26,6 +26,7 @@
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ 1       /* Memory stats virtqueue */
+#define VIRTIO_BALLOON_F_BALLOON_HINT 2   /* Ballon hint */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -36,6 +37,8 @@ struct virtio_balloon_config
     uint32_t num_pages;
     /* Number of pages we've actually got in balloon. */
     uint32_t actual;
+    /* Hint, should we reclaim cached pages first? */
+    uint32_t reclaim_cache_first;
 };
 
 /* Memory Statistics */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..1da2e65 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -605,8 +605,8 @@ EQMP
 
     {
         .name       = "balloon",
-        .args_type  = "value:M",
-        .params     = "target",
+        .args_type  = "value:M,hint:s?",
+        .params     = "target [cache]",
         .help       = "request VM to change its memory allocation (in MB)",
         .user_print = monitor_user_noop,
         .mhandler.cmd_async = do_balloon,
@@ -622,10 +622,11 @@ Request VM to change its memory allocation (in bytes).
 Arguments:
 
 - "value": New memory allocation (json-int)
+- "hint": Optional hint (json-string, optional)
 
 Example:
 
--> { "execute": "balloon", "arguments": { "value": 536870912 } }
++-> { "execute": "balloon", "arguments": { "value": 536870912, "hint":"cache" } }
 <- { "return": {} }
 
 EQMP

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC][PATCH 3/3] QEmu changes to provide balloon hint
@ 2010-10-28 22:40   ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-10-28 22:40 UTC (permalink / raw)
  To: kvm; +Cc: linux-mm, qemu-devel, Balbir Singh

Provide memory hint during ballooning

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch adds an optional hint to the qemu monitor balloon
command. The hint tells the guest operating system to consider
a class of memory during reclaim. Currently the supported
hint is cached memory. The design is generic and can be extended
to provide other hints in the future if required.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 balloon.c           |   18 ++++++++++++++----
 balloon.h           |    4 +++-
 hmp-commands.hx     |    7 +++++--
 hw/virtio-balloon.c |   15 +++++++++++----
 hw/virtio-balloon.h |    3 +++
 qmp-commands.hx     |    7 ++++---
 6 files changed, 40 insertions(+), 14 deletions(-)


diff --git a/balloon.c b/balloon.c
index 0021fef..b2bdda5 100644
--- a/balloon.c
+++ b/balloon.c
@@ -41,11 +41,13 @@ void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque)
     qemu_balloon_event_opaque = opaque;
 }
 
-int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque)
+int qemu_balloon(ram_addr_t target, bool reclaim_cache_first,
+                 MonitorCompletion cb, void *opaque)
 {
     if (qemu_balloon_event) {
         trace_balloon_event(qemu_balloon_event_opaque, target);
-        qemu_balloon_event(qemu_balloon_event_opaque, target, cb, opaque);
+        qemu_balloon_event(qemu_balloon_event_opaque, target,
+                           reclaim_cache_first, cb, opaque);
         return 1;
     } else {
         return 0;
@@ -55,7 +57,7 @@ int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque)
 int qemu_balloon_status(MonitorCompletion cb, void *opaque)
 {
     if (qemu_balloon_event) {
-        qemu_balloon_event(qemu_balloon_event_opaque, 0, cb, opaque);
+        qemu_balloon_event(qemu_balloon_event_opaque, 0, 0, cb, opaque);
         return 1;
     } else {
         return 0;
@@ -131,13 +133,21 @@ int do_balloon(Monitor *mon, const QDict *params,
 	       MonitorCompletion cb, void *opaque)
 {
     int ret;
+    int val;
+    const char *cache_hint;
+    int reclaim_cache_first = 0;
 
     if (kvm_enabled() && !kvm_has_sync_mmu()) {
         qerror_report(QERR_KVM_MISSING_CAP, "synchronous MMU", "balloon");
         return -1;
     }
 
-    ret = qemu_balloon(qdict_get_int(params, "value"), cb, opaque);
+    val = qdict_get_int(params, "value");
+    cache_hint = qdict_get_try_str(params, "hint");
+    if (cache_hint)
+        reclaim_cache_first = 1;
+
+    ret = qemu_balloon(val, reclaim_cache_first, cb, opaque);
     if (ret == 0) {
         qerror_report(QERR_DEVICE_NOT_ACTIVE, "balloon");
         return -1;
diff --git a/balloon.h b/balloon.h
index d478e28..65d68c1 100644
--- a/balloon.h
+++ b/balloon.h
@@ -17,11 +17,13 @@
 #include "monitor.h"
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target,
+                                bool reclaim_cache_first,
                                 MonitorCompletion cb, void *cb_data);
 
 void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque);
 
-int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque);
+int qemu_balloon(ram_addr_t target, bool reclaim_cache_first,
+                 MonitorCompletion cb, void *opaque);
 
 int qemu_balloon_status(MonitorCompletion cb, void *opaque);
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 81999aa..80e42aa 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -925,8 +925,8 @@ ETEXI
 
     {
         .name       = "balloon",
-        .args_type  = "value:M",
-        .params     = "target",
+        .args_type  = "value:M,hint:s?",
+        .params     = "target [cache]",
         .help       = "request VM to change its memory allocation (in MB)",
         .user_print = monitor_user_noop,
         .mhandler.cmd_async = do_balloon,
@@ -937,6 +937,9 @@ STEXI
 @item balloon @var{value}
 @findex balloon
 Request VM to change its memory allocation to @var{value} (in MB).
+An optional @var{hint} can be specified to indicate if the guest
+should reclaim from the cached memory in the guest first. The
+@var{hint} may be ignored by the guest.
 ETEXI
 
     {
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index 8adddea..e363507 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -44,6 +44,7 @@ typedef struct VirtIOBalloon
     size_t stats_vq_offset;
     MonitorCompletion *stats_callback;
     void *stats_opaque_callback_data;
+    uint32_t reclaim_cache_first;
 } VirtIOBalloon;
 
 static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
@@ -181,8 +182,11 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
-
-    memcpy(config_data, &config, 8);
+    if (vdev->guest_features & (1 << VIRTIO_BALLOON_F_BALLOON_HINT)) {
+        config.reclaim_cache_first = cpu_to_le32(dev->reclaim_cache_first);
+        memcpy(config_data, &config, 12);
+    } else
+        memcpy(config_data, &config, 8);
 }
 
 static void virtio_balloon_set_config(VirtIODevice *vdev,
@@ -196,11 +200,13 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
 {
-    f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
+    f |= (1 << VIRTIO_BALLOON_F_STATS_VQ) |
+         (1 << VIRTIO_BALLOON_F_BALLOON_HINT);
     return f;
 }
 
 static void virtio_balloon_to_target(void *opaque, ram_addr_t target,
+                                     bool reclaim_cache_first,
                                      MonitorCompletion cb, void *cb_data)
 {
     VirtIOBalloon *dev = opaque;
@@ -210,6 +216,7 @@ static void virtio_balloon_to_target(void *opaque, ram_addr_t target,
 
     if (target) {
         dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT;
+        dev->reclaim_cache_first = reclaim_cache_first;
         virtio_notify_config(&dev->vdev);
     } else {
         /* For now, only allow one request at a time.  This restriction can be
@@ -263,7 +270,7 @@ VirtIODevice *virtio_balloon_init(DeviceState *dev)
 
     s = (VirtIOBalloon *)virtio_common_init("virtio-balloon",
                                             VIRTIO_ID_BALLOON,
-                                            8, sizeof(VirtIOBalloon));
+                                            12, sizeof(VirtIOBalloon));
 
     s->vdev.get_config = virtio_balloon_get_config;
     s->vdev.set_config = virtio_balloon_set_config;
diff --git a/hw/virtio-balloon.h b/hw/virtio-balloon.h
index e20cf6b..39d1b01 100644
--- a/hw/virtio-balloon.h
+++ b/hw/virtio-balloon.h
@@ -26,6 +26,7 @@
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ 1       /* Memory stats virtqueue */
+#define VIRTIO_BALLOON_F_BALLOON_HINT 2   /* Ballon hint */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -36,6 +37,8 @@ struct virtio_balloon_config
     uint32_t num_pages;
     /* Number of pages we've actually got in balloon. */
     uint32_t actual;
+    /* Hint, should we reclaim cached pages first? */
+    uint32_t reclaim_cache_first;
 };
 
 /* Memory Statistics */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..1da2e65 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -605,8 +605,8 @@ EQMP
 
     {
         .name       = "balloon",
-        .args_type  = "value:M",
-        .params     = "target",
+        .args_type  = "value:M,hint:s?",
+        .params     = "target [cache]",
         .help       = "request VM to change its memory allocation (in MB)",
         .user_print = monitor_user_noop,
         .mhandler.cmd_async = do_balloon,
@@ -622,10 +622,11 @@ Request VM to change its memory allocation (in bytes).
 Arguments:
 
 - "value": New memory allocation (json-int)
+- "hint": Optional hint (json-string, optional)
 
 Example:
 
--> { "execute": "balloon", "arguments": { "value": 536870912 } }
++-> { "execute": "balloon", "arguments": { "value": 536870912, "hint":"cache" } }
 <- { "return": {} }
 
 EQMP

-- 
	Three Cheers,
	Balbir

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
  2010-10-28 22:40   ` [Qemu-devel] " Balbir Singh
@ 2010-11-03 14:35     ` Christoph Lameter
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2010-11-03 14:35 UTC (permalink / raw)
  To: Balbir Singh; +Cc: kvm, linux-mm, qemu-devel

On Fri, 29 Oct 2010, Balbir Singh wrote:

> A lot of the code is borrowed from zone_reclaim_mode logic for
> __zone_reclaim(). One might argue that the with ballooning and
> KSM this feature is not very useful, but even with ballooning,

Interesting use of zone reclaim. I am having a difficult time reviewing
the patch since you move and modify functions at the same time. Could you
separate that out a bit?

> +#define UNMAPPED_PAGE_RATIO 16

Maybe come up with a scheme that allows better configuration of the
mininum? I think in some setting we may want an absolute limit and in
other a fraction of something (total zone size or working set?)


> +bool should_balance_unmapped_pages(struct zone *zone)
> +{
> +	if (unmapped_page_control &&
> +		(zone_unmapped_file_pages(zone) >
> +			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> +		return true;
> +	return false;
> +}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] Re: [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
@ 2010-11-03 14:35     ` Christoph Lameter
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2010-11-03 14:35 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, qemu-devel, kvm

On Fri, 29 Oct 2010, Balbir Singh wrote:

> A lot of the code is borrowed from zone_reclaim_mode logic for
> __zone_reclaim(). One might argue that the with ballooning and
> KSM this feature is not very useful, but even with ballooning,

Interesting use of zone reclaim. I am having a difficult time reviewing
the patch since you move and modify functions at the same time. Could you
separate that out a bit?

> +#define UNMAPPED_PAGE_RATIO 16

Maybe come up with a scheme that allows better configuration of the
mininum? I think in some setting we may want an absolute limit and in
other a fraction of something (total zone size or working set?)


> +bool should_balance_unmapped_pages(struct zone *zone)
> +{
> +	if (unmapped_page_control &&
> +		(zone_unmapped_file_pages(zone) >
> +			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> +		return true;
> +	return false;
> +}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
  2010-11-03 14:35     ` [Qemu-devel] " Christoph Lameter
  (?)
@ 2010-11-03 17:17       ` Balbir Singh
  -1 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-11-03 17:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: kvm, linux-mm, qemu-devel

* Christoph Lameter <cl@linux.com> [2010-11-03 09:35:33]:

> On Fri, 29 Oct 2010, Balbir Singh wrote:
> 
> > A lot of the code is borrowed from zone_reclaim_mode logic for
> > __zone_reclaim(). One might argue that the with ballooning and
> > KSM this feature is not very useful, but even with ballooning,
> 
> Interesting use of zone reclaim. I am having a difficult time reviewing
> the patch since you move and modify functions at the same time. Could you
> separate that out a bit?
>

Sure, I'll split it out into more readable bits and repost the mm
versions first.
 
> > +#define UNMAPPED_PAGE_RATIO 16
> 
> Maybe come up with a scheme that allows better configuration of the
> mininum? I think in some setting we may want an absolute limit and in
> other a fraction of something (total zone size or working set?)
>

Are you suggesting a sysctl or computation based on zone size and
limit, etc? I understand it to be the latter.
 
> 
> > +bool should_balance_unmapped_pages(struct zone *zone)
> > +{
> > +	if (unmapped_page_control &&
> > +		(zone_unmapped_file_pages(zone) >
> > +			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> > +		return true;
> > +	return false;
> > +}
> 

Thanks for your review.

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
@ 2010-11-03 17:17       ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-11-03 17:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: kvm, linux-mm, qemu-devel

* Christoph Lameter <cl@linux.com> [2010-11-03 09:35:33]:

> On Fri, 29 Oct 2010, Balbir Singh wrote:
> 
> > A lot of the code is borrowed from zone_reclaim_mode logic for
> > __zone_reclaim(). One might argue that the with ballooning and
> > KSM this feature is not very useful, but even with ballooning,
> 
> Interesting use of zone reclaim. I am having a difficult time reviewing
> the patch since you move and modify functions at the same time. Could you
> separate that out a bit?
>

Sure, I'll split it out into more readable bits and repost the mm
versions first.
 
> > +#define UNMAPPED_PAGE_RATIO 16
> 
> Maybe come up with a scheme that allows better configuration of the
> mininum? I think in some setting we may want an absolute limit and in
> other a fraction of something (total zone size or working set?)
>

Are you suggesting a sysctl or computation based on zone size and
limit, etc? I understand it to be the latter.
 
> 
> > +bool should_balance_unmapped_pages(struct zone *zone)
> > +{
> > +	if (unmapped_page_control &&
> > +		(zone_unmapped_file_pages(zone) >
> > +			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> > +		return true;
> > +	return false;
> > +}
> 

Thanks for your review.

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] Re: [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
@ 2010-11-03 17:17       ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2010-11-03 17:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, qemu-devel, kvm

* Christoph Lameter <cl@linux.com> [2010-11-03 09:35:33]:

> On Fri, 29 Oct 2010, Balbir Singh wrote:
> 
> > A lot of the code is borrowed from zone_reclaim_mode logic for
> > __zone_reclaim(). One might argue that the with ballooning and
> > KSM this feature is not very useful, but even with ballooning,
> 
> Interesting use of zone reclaim. I am having a difficult time reviewing
> the patch since you move and modify functions at the same time. Could you
> separate that out a bit?
>

Sure, I'll split it out into more readable bits and repost the mm
versions first.
 
> > +#define UNMAPPED_PAGE_RATIO 16
> 
> Maybe come up with a scheme that allows better configuration of the
> mininum? I think in some setting we may want an absolute limit and in
> other a fraction of something (total zone size or working set?)
>

Are you suggesting a sysctl or computation based on zone size and
limit, etc? I understand it to be the latter.
 
> 
> > +bool should_balance_unmapped_pages(struct zone *zone)
> > +{
> > +	if (unmapped_page_control &&
> > +		(zone_unmapped_file_pages(zone) >
> > +			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> > +		return true;
> > +	return false;
> > +}
> 

Thanks for your review.

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
  2010-11-03 17:17       ` Balbir Singh
  (?)
@ 2010-11-03 17:58         ` Christoph Lameter
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2010-11-03 17:58 UTC (permalink / raw)
  To: Balbir Singh; +Cc: kvm, linux-mm, qemu-devel

On Wed, 3 Nov 2010, Balbir Singh wrote:

> > > +#define UNMAPPED_PAGE_RATIO 16
> >
> > Maybe come up with a scheme that allows better configuration of the
> > mininum? I think in some setting we may want an absolute limit and in
> > other a fraction of something (total zone size or working set?)
> >
>
> Are you suggesting a sysctl or computation based on zone size and
> limit, etc? I understand it to be the latter.

Do a computation based on zone size on startup and then allow the
user to modify the absolute size of the page cache?


Hmmm.. That would have to be per zone/node or somehow distributed over all
zones/nodes.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
@ 2010-11-03 17:58         ` Christoph Lameter
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2010-11-03 17:58 UTC (permalink / raw)
  To: Balbir Singh; +Cc: kvm, linux-mm, qemu-devel

On Wed, 3 Nov 2010, Balbir Singh wrote:

> > > +#define UNMAPPED_PAGE_RATIO 16
> >
> > Maybe come up with a scheme that allows better configuration of the
> > mininum? I think in some setting we may want an absolute limit and in
> > other a fraction of something (total zone size or working set?)
> >
>
> Are you suggesting a sysctl or computation based on zone size and
> limit, etc? I understand it to be the latter.

Do a computation based on zone size on startup and then allow the
user to modify the absolute size of the page cache?


Hmmm.. That would have to be per zone/node or somehow distributed over all
zones/nodes.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] Re: [RFC][PATCH 1/3] Linux/Guest unmapped page cache control
@ 2010-11-03 17:58         ` Christoph Lameter
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2010-11-03 17:58 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, qemu-devel, kvm

On Wed, 3 Nov 2010, Balbir Singh wrote:

> > > +#define UNMAPPED_PAGE_RATIO 16
> >
> > Maybe come up with a scheme that allows better configuration of the
> > mininum? I think in some setting we may want an absolute limit and in
> > other a fraction of something (total zone size or working set?)
> >
>
> Are you suggesting a sysctl or computation based on zone size and
> limit, etc? I understand it to be the latter.

Do a computation based on zone size on startup and then allow the
user to modify the absolute size of the page cache?


Hmmm.. That would have to be per zone/node or somehow distributed over all
zones/nodes.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-11-03 17:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-28 22:40 [RFC][PATCH 0/3] KVM page cache optimization (v3) Balbir Singh
2010-10-28 22:40 ` [Qemu-devel] " Balbir Singh
2010-10-28 22:40 ` [RFC][PATCH 1/3] Linux/Guest unmapped page cache control Balbir Singh
2010-10-28 22:40   ` [Qemu-devel] " Balbir Singh
2010-11-03 14:35   ` Christoph Lameter
2010-11-03 14:35     ` [Qemu-devel] " Christoph Lameter
2010-11-03 17:17     ` Balbir Singh
2010-11-03 17:17       ` [Qemu-devel] " Balbir Singh
2010-11-03 17:17       ` Balbir Singh
2010-11-03 17:58       ` Christoph Lameter
2010-11-03 17:58         ` [Qemu-devel] " Christoph Lameter
2010-11-03 17:58         ` Christoph Lameter
2010-10-28 22:40 ` [RFC][PATCH 2/3] Linux/Guest cooperative " Balbir Singh
2010-10-28 22:40   ` [Qemu-devel] " Balbir Singh
2010-10-28 22:40   ` Balbir Singh
2010-10-28 22:40 ` [RFC][PATCH 3/3] QEmu changes to provide balloon hint Balbir Singh
2010-10-28 22:40   ` [Qemu-devel] " Balbir Singh
2010-10-28 22:40   ` Balbir Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.