All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch v3] swap: virtual swap readahead
@ 2009-06-09 19:01 ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-09 19:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Hugh Dickins, Andi Kleen, Wu Fengguang,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

[resend with lists cc'd, sorry]

Hi,

here is a new iteration of the virtual swap readahead.  Per Hugh's
suggestion, I moved the pte collecting to the callsite and thus out
ouf swap code.  Unfortunately, I had to bound page_cluster due to an
array of that many swap entries on the stack, but I think it is better
to limit the cluster size to a sane maximum than using dynamic
allocation for this purpose.

Thanks all for the helpful suggestions.  KAMEZAWA-san and Minchan, I
didn't incorporate your ideas in this patch as I think they belong in
a different one with their own justifications.  I didn't ignore them.

       Hannes

---
The current swap readahead implementation reads a physically
contiguous group of swap slots around the faulting page to take
advantage of the disk head's position and in the hope that the
surrounding pages will be needed soon as well.

This works as long as the physical swap slot order approximates the
LRU order decently, otherwise it wastes memory and IO bandwidth to
read in pages that are unlikely to be needed soon.

However, the physical swap slot layout diverges from the LRU order
with increasing swap activity, i.e. high memory pressure situations,
and this is exactly the situation where swapin should not waste any
memory or IO bandwidth as both are the most contended resources at
this point.

Another approximation for LRU-relation is the VMA order as groups of
VMA-related pages are usually used together.

This patch combines both the physical and the virtual hint to get a
good approximation of pages that are sensible to read ahead.

When both diverge, we either read unrelated data, seek heavily for
related data, or, what this patch does, just decrease the readahead
efforts.

To achieve this, we have essentially two readahead windows of the same
size: one spans the virtual, the other one the physical neighborhood
of the faulting page.  We only read where both areas overlap.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/swap.h |    4 ++-
 kernel/sysctl.c      |    7 ++++-
 mm/memory.c          |   55 +++++++++++++++++++++++++++++++++++++++++
 mm/shmem.c           |    4 +--
 mm/swap_state.c      |   67 ++++++++++++++++++++++++++++++++++++++-------------
 5 files changed, 116 insertions(+), 21 deletions(-)

version 3:
  o move pte selection to callee (per Hugh)
  o limit ra ptes to one pmd entry to avoid multiple
    locking/mapping of highptes (per Hugh)

version 2:
  o fall back to physical ra window for shmem
  o add documentation to the new ra algorithm (per Andrew)

--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -327,27 +327,14 @@ struct page *read_swap_cache_async(swp_e
 	return found_page;
 }
 
-/**
- * swapin_readahead - swap in pages in hope we need them soon
- * @entry: swap entry of this memory
- * @gfp_mask: memory allocation flags
- * @vma: user vma this address belongs to
- * @addr: target address for mempolicy
- *
- * Returns the struct page for entry and addr, after queueing swapin.
- *
+/*
  * Primitive swap readahead code. We simply read an aligned block of
  * (1 << page_cluster) entries in the swap area. This method is chosen
  * because it doesn't cost us any seek time.  We also make sure to queue
  * the 'original' request together with the readahead ones...
- *
- * This has been extended to use the NUMA policies from the mm triggering
- * the readahead.
- *
- * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
  */
-struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
-			struct vm_area_struct *vma, unsigned long addr)
+static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
+				struct vm_area_struct *vma, unsigned long addr)
 {
 	int nr_pages;
 	struct page *page;
@@ -373,3 +360,51 @@ struct page *swapin_readahead(swp_entry_
 	lru_add_drain();	/* Push any new pages onto the LRU now */
 	return read_swap_cache_async(entry, gfp_mask, vma, addr);
 }
+
+/**
+ * swapin_readahead - swap in pages in hope we need them soon
+ * @entry: swap entry of this memory
+ * @gfp_mask: memory allocation flags
+ * @vma: user vma this address belongs to
+ * @addr: target address for mempolicy
+ * @entries: swap slots to consider reading
+ * @nr_entries: number of @entries
+ * @cluster: readahead window size in swap slots
+ *
+ * Returns the struct page for entry and addr, after queueing swapin.
+ *
+ * This has been extended to use the NUMA policies from the mm
+ * triggering the readahead.
+ *
+ * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
+ */
+struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
+			struct vm_area_struct *vma, unsigned long addr,
+			swp_entry_t *entries, int nr_entries,
+			unsigned long cluster)
+{
+	unsigned long pmin, pmax;
+	int i;
+
+	if (!entries)	/* XXX: shmem case */
+		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
+	pmin = swp_offset(entry) & ~(cluster - 1);
+	pmax = pmin + cluster;
+	for (i = 0; i < nr_entries; i++) {
+		swp_entry_t swp = entries[i];
+		struct page *page;
+
+		if (swp_type(swp) != swp_type(entry))
+			continue;
+		if (swp_offset(swp) > pmax)
+			continue;
+		if (swp_offset(swp) < pmin)
+			continue;
+		page = read_swap_cache_async(swp, gfp_mask, vma, addr);
+		if (!page)
+			break;
+		page_cache_release(page);
+	}
+	lru_add_drain();	/* Push any new pages onto the LRU now */
+	return read_swap_cache_async(entry, gfp_mask, vma, addr);
+}
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -292,7 +292,9 @@ extern struct page *lookup_swap_cache(sw
 extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
 			struct vm_area_struct *vma, unsigned long addr);
 extern struct page *swapin_readahead(swp_entry_t, gfp_t,
-			struct vm_area_struct *vma, unsigned long addr);
+			struct vm_area_struct *vma, unsigned long addr,
+			swp_entry_t *entries, int nr_entries,
+			unsigned long cluster);
 
 /* linux/mm/swapfile.c */
 extern long nr_swap_pages;
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2440,6 +2440,54 @@ int vmtruncate_range(struct inode *inode
 }
 
 /*
+ * The readahead window is the virtual area around the faulting page,
+ * where the physical proximity of the swap slots is taken into
+ * account as well in swapin_readahead().
+ *
+ * While the swap allocation algorithm tries to keep LRU-related pages
+ * together on the swap backing, it is not reliable on heavy thrashing
+ * systems where concurrent reclaimers allocate swap slots and/or most
+ * anonymous memory pages are already in swap cache.
+ *
+ * On the virtual side, subgroups of VMA-related pages are usually
+ * used together, which gives another hint to LRU relationship.
+ *
+ * By taking both aspects into account, we get a good approximation of
+ * which pages are sensible to read together with the faulting one.
+ */
+static int swap_readahead_ptes(struct mm_struct *mm,
+			unsigned long addr, pmd_t *pmd,
+			swp_entry_t *entries,
+			unsigned long cluster)
+{
+	unsigned long window, min, max, limit;
+	spinlock_t *ptl;
+	pte_t *ptep;
+	int i, nr;
+
+	window = cluster << PAGE_SHIFT;
+	min = addr & ~(window - 1);
+	max = min + cluster;
+	/*
+	 * To keep the locking/highpte mapping simple, stay
+	 * within the PTE range of one PMD entry.
+	 */
+	limit = addr & PMD_MASK;
+	if (limit > min)
+		min = limit;
+	limit = pmd_addr_end(addr, max);
+	if (limit < max)
+		max = limit;
+	limit = max - min;
+	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
+	for (i = nr = 0; i < limit; i++)
+		if (is_swap_pte(ptep[i]))
+			entries[nr++] = pte_to_swp_entry(ptep[i]);
+	pte_unmap_unlock(ptep, ptl);
+	return nr;
+}
+
+/*
  * We enter with non-exclusive mmap_sem (to exclude vma changes,
  * but allow concurrent faults), and pte mapped but not yet locked.
  * We return with mmap_sem still held, but pte unmapped and unlocked.
@@ -2466,9 +2514,14 @@ static int do_swap_page(struct mm_struct
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
 	page = lookup_swap_cache(entry);
 	if (!page) {
+		int nr, cluster = 1 << page_cluster;
+		swp_entry_t entries[cluster];
+
 		grab_swap_token(); /* Contend for token _before_ read-in */
+		nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
 		page = swapin_readahead(entry,
-					GFP_HIGHUSER_MOVABLE, vma, address);
+					GFP_HIGHUSER_MOVABLE, vma, address,
+					entries, nr, cluster);
 		if (!page) {
 			/*
 			 * Back out if somebody else faulted in this pte
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1148,7 +1148,7 @@ static struct page *shmem_swapin(swp_ent
 	pvma.vm_pgoff = idx;
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = spol;
-	page = swapin_readahead(entry, gfp, &pvma, 0);
+	page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
 	return page;
 }
 
@@ -1178,7 +1178,7 @@ static inline void shmem_show_mpol(struc
 static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
 			struct shmem_inode_info *info, unsigned long idx)
 {
-	return swapin_readahead(entry, gfp, NULL, 0);
+	return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
 }
 
 static inline struct page *shmem_alloc_page(gfp_t gfp,
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -112,6 +112,8 @@ static int min_percpu_pagelist_fract = 8
 
 static int ngroups_max = NGROUPS_MAX;
 
+static int page_cluster_max = 5;
+
 #ifdef CONFIG_MODULES
 extern char modprobe_path[];
 #endif
@@ -966,7 +968,10 @@ static struct ctl_table vm_table[] = {
 		.data		= &page_cluster,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= &proc_dointvec,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+		.extra2		= &page_cluster_max,
 	},
 	{
 		.ctl_name	= VM_DIRTY_BACKGROUND,

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [patch v3] swap: virtual swap readahead
@ 2009-06-09 19:01 ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-09 19:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Hugh Dickins, Andi Kleen, Wu Fengguang,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

[resend with lists cc'd, sorry]

Hi,

here is a new iteration of the virtual swap readahead.  Per Hugh's
suggestion, I moved the pte collecting to the callsite and thus out
ouf swap code.  Unfortunately, I had to bound page_cluster due to an
array of that many swap entries on the stack, but I think it is better
to limit the cluster size to a sane maximum than using dynamic
allocation for this purpose.

Thanks all for the helpful suggestions.  KAMEZAWA-san and Minchan, I
didn't incorporate your ideas in this patch as I think they belong in
a different one with their own justifications.  I didn't ignore them.

       Hannes

---
The current swap readahead implementation reads a physically
contiguous group of swap slots around the faulting page to take
advantage of the disk head's position and in the hope that the
surrounding pages will be needed soon as well.

This works as long as the physical swap slot order approximates the
LRU order decently, otherwise it wastes memory and IO bandwidth to
read in pages that are unlikely to be needed soon.

However, the physical swap slot layout diverges from the LRU order
with increasing swap activity, i.e. high memory pressure situations,
and this is exactly the situation where swapin should not waste any
memory or IO bandwidth as both are the most contended resources at
this point.

Another approximation for LRU-relation is the VMA order as groups of
VMA-related pages are usually used together.

This patch combines both the physical and the virtual hint to get a
good approximation of pages that are sensible to read ahead.

When both diverge, we either read unrelated data, seek heavily for
related data, or, what this patch does, just decrease the readahead
efforts.

To achieve this, we have essentially two readahead windows of the same
size: one spans the virtual, the other one the physical neighborhood
of the faulting page.  We only read where both areas overlap.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/swap.h |    4 ++-
 kernel/sysctl.c      |    7 ++++-
 mm/memory.c          |   55 +++++++++++++++++++++++++++++++++++++++++
 mm/shmem.c           |    4 +--
 mm/swap_state.c      |   67 ++++++++++++++++++++++++++++++++++++++-------------
 5 files changed, 116 insertions(+), 21 deletions(-)

version 3:
  o move pte selection to callee (per Hugh)
  o limit ra ptes to one pmd entry to avoid multiple
    locking/mapping of highptes (per Hugh)

version 2:
  o fall back to physical ra window for shmem
  o add documentation to the new ra algorithm (per Andrew)

--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -327,27 +327,14 @@ struct page *read_swap_cache_async(swp_e
 	return found_page;
 }
 
-/**
- * swapin_readahead - swap in pages in hope we need them soon
- * @entry: swap entry of this memory
- * @gfp_mask: memory allocation flags
- * @vma: user vma this address belongs to
- * @addr: target address for mempolicy
- *
- * Returns the struct page for entry and addr, after queueing swapin.
- *
+/*
  * Primitive swap readahead code. We simply read an aligned block of
  * (1 << page_cluster) entries in the swap area. This method is chosen
  * because it doesn't cost us any seek time.  We also make sure to queue
  * the 'original' request together with the readahead ones...
- *
- * This has been extended to use the NUMA policies from the mm triggering
- * the readahead.
- *
- * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
  */
-struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
-			struct vm_area_struct *vma, unsigned long addr)
+static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
+				struct vm_area_struct *vma, unsigned long addr)
 {
 	int nr_pages;
 	struct page *page;
@@ -373,3 +360,51 @@ struct page *swapin_readahead(swp_entry_
 	lru_add_drain();	/* Push any new pages onto the LRU now */
 	return read_swap_cache_async(entry, gfp_mask, vma, addr);
 }
+
+/**
+ * swapin_readahead - swap in pages in hope we need them soon
+ * @entry: swap entry of this memory
+ * @gfp_mask: memory allocation flags
+ * @vma: user vma this address belongs to
+ * @addr: target address for mempolicy
+ * @entries: swap slots to consider reading
+ * @nr_entries: number of @entries
+ * @cluster: readahead window size in swap slots
+ *
+ * Returns the struct page for entry and addr, after queueing swapin.
+ *
+ * This has been extended to use the NUMA policies from the mm
+ * triggering the readahead.
+ *
+ * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
+ */
+struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
+			struct vm_area_struct *vma, unsigned long addr,
+			swp_entry_t *entries, int nr_entries,
+			unsigned long cluster)
+{
+	unsigned long pmin, pmax;
+	int i;
+
+	if (!entries)	/* XXX: shmem case */
+		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
+	pmin = swp_offset(entry) & ~(cluster - 1);
+	pmax = pmin + cluster;
+	for (i = 0; i < nr_entries; i++) {
+		swp_entry_t swp = entries[i];
+		struct page *page;
+
+		if (swp_type(swp) != swp_type(entry))
+			continue;
+		if (swp_offset(swp) > pmax)
+			continue;
+		if (swp_offset(swp) < pmin)
+			continue;
+		page = read_swap_cache_async(swp, gfp_mask, vma, addr);
+		if (!page)
+			break;
+		page_cache_release(page);
+	}
+	lru_add_drain();	/* Push any new pages onto the LRU now */
+	return read_swap_cache_async(entry, gfp_mask, vma, addr);
+}
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -292,7 +292,9 @@ extern struct page *lookup_swap_cache(sw
 extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
 			struct vm_area_struct *vma, unsigned long addr);
 extern struct page *swapin_readahead(swp_entry_t, gfp_t,
-			struct vm_area_struct *vma, unsigned long addr);
+			struct vm_area_struct *vma, unsigned long addr,
+			swp_entry_t *entries, int nr_entries,
+			unsigned long cluster);
 
 /* linux/mm/swapfile.c */
 extern long nr_swap_pages;
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2440,6 +2440,54 @@ int vmtruncate_range(struct inode *inode
 }
 
 /*
+ * The readahead window is the virtual area around the faulting page,
+ * where the physical proximity of the swap slots is taken into
+ * account as well in swapin_readahead().
+ *
+ * While the swap allocation algorithm tries to keep LRU-related pages
+ * together on the swap backing, it is not reliable on heavy thrashing
+ * systems where concurrent reclaimers allocate swap slots and/or most
+ * anonymous memory pages are already in swap cache.
+ *
+ * On the virtual side, subgroups of VMA-related pages are usually
+ * used together, which gives another hint to LRU relationship.
+ *
+ * By taking both aspects into account, we get a good approximation of
+ * which pages are sensible to read together with the faulting one.
+ */
+static int swap_readahead_ptes(struct mm_struct *mm,
+			unsigned long addr, pmd_t *pmd,
+			swp_entry_t *entries,
+			unsigned long cluster)
+{
+	unsigned long window, min, max, limit;
+	spinlock_t *ptl;
+	pte_t *ptep;
+	int i, nr;
+
+	window = cluster << PAGE_SHIFT;
+	min = addr & ~(window - 1);
+	max = min + cluster;
+	/*
+	 * To keep the locking/highpte mapping simple, stay
+	 * within the PTE range of one PMD entry.
+	 */
+	limit = addr & PMD_MASK;
+	if (limit > min)
+		min = limit;
+	limit = pmd_addr_end(addr, max);
+	if (limit < max)
+		max = limit;
+	limit = max - min;
+	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
+	for (i = nr = 0; i < limit; i++)
+		if (is_swap_pte(ptep[i]))
+			entries[nr++] = pte_to_swp_entry(ptep[i]);
+	pte_unmap_unlock(ptep, ptl);
+	return nr;
+}
+
+/*
  * We enter with non-exclusive mmap_sem (to exclude vma changes,
  * but allow concurrent faults), and pte mapped but not yet locked.
  * We return with mmap_sem still held, but pte unmapped and unlocked.
@@ -2466,9 +2514,14 @@ static int do_swap_page(struct mm_struct
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
 	page = lookup_swap_cache(entry);
 	if (!page) {
+		int nr, cluster = 1 << page_cluster;
+		swp_entry_t entries[cluster];
+
 		grab_swap_token(); /* Contend for token _before_ read-in */
+		nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
 		page = swapin_readahead(entry,
-					GFP_HIGHUSER_MOVABLE, vma, address);
+					GFP_HIGHUSER_MOVABLE, vma, address,
+					entries, nr, cluster);
 		if (!page) {
 			/*
 			 * Back out if somebody else faulted in this pte
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1148,7 +1148,7 @@ static struct page *shmem_swapin(swp_ent
 	pvma.vm_pgoff = idx;
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = spol;
-	page = swapin_readahead(entry, gfp, &pvma, 0);
+	page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
 	return page;
 }
 
@@ -1178,7 +1178,7 @@ static inline void shmem_show_mpol(struc
 static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
 			struct shmem_inode_info *info, unsigned long idx)
 {
-	return swapin_readahead(entry, gfp, NULL, 0);
+	return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
 }
 
 static inline struct page *shmem_alloc_page(gfp_t gfp,
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -112,6 +112,8 @@ static int min_percpu_pagelist_fract = 8
 
 static int ngroups_max = NGROUPS_MAX;
 
+static int page_cluster_max = 5;
+
 #ifdef CONFIG_MODULES
 extern char modprobe_path[];
 #endif
@@ -966,7 +968,10 @@ static struct ctl_table vm_table[] = {
 		.data		= &page_cluster,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= &proc_dointvec,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+		.extra2		= &page_cluster_max,
 	},
 	{
 		.ctl_name	= VM_DIRTY_BACKGROUND,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-09 19:01 ` Johannes Weiner
@ 2009-06-09 19:37   ` Johannes Weiner
  -1 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-09 19:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Hugh Dickins, Andi Kleen, Wu Fengguang,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> [resend with lists cc'd, sorry]

[and fixed Hugh's email.  crap]

> Hi,
> 
> here is a new iteration of the virtual swap readahead.  Per Hugh's
> suggestion, I moved the pte collecting to the callsite and thus out
> ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> array of that many swap entries on the stack, but I think it is better
> to limit the cluster size to a sane maximum than using dynamic
> allocation for this purpose.
> 
> Thanks all for the helpful suggestions.  KAMEZAWA-san and Minchan, I
> didn't incorporate your ideas in this patch as I think they belong in
> a different one with their own justifications.  I didn't ignore them.
> 
>        Hannes
> 
> ---
> The current swap readahead implementation reads a physically
> contiguous group of swap slots around the faulting page to take
> advantage of the disk head's position and in the hope that the
> surrounding pages will be needed soon as well.
> 
> This works as long as the physical swap slot order approximates the
> LRU order decently, otherwise it wastes memory and IO bandwidth to
> read in pages that are unlikely to be needed soon.
> 
> However, the physical swap slot layout diverges from the LRU order
> with increasing swap activity, i.e. high memory pressure situations,
> and this is exactly the situation where swapin should not waste any
> memory or IO bandwidth as both are the most contended resources at
> this point.
> 
> Another approximation for LRU-relation is the VMA order as groups of
> VMA-related pages are usually used together.
> 
> This patch combines both the physical and the virtual hint to get a
> good approximation of pages that are sensible to read ahead.
> 
> When both diverge, we either read unrelated data, seek heavily for
> related data, or, what this patch does, just decrease the readahead
> efforts.
> 
> To achieve this, we have essentially two readahead windows of the same
> size: one spans the virtual, the other one the physical neighborhood
> of the faulting page.  We only read where both areas overlap.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> ---
>  include/linux/swap.h |    4 ++-
>  kernel/sysctl.c      |    7 ++++-
>  mm/memory.c          |   55 +++++++++++++++++++++++++++++++++++++++++
>  mm/shmem.c           |    4 +--
>  mm/swap_state.c      |   67 ++++++++++++++++++++++++++++++++++++++-------------
>  5 files changed, 116 insertions(+), 21 deletions(-)
> 
> version 3:
>   o move pte selection to callee (per Hugh)
>   o limit ra ptes to one pmd entry to avoid multiple
>     locking/mapping of highptes (per Hugh)
> 
> version 2:
>   o fall back to physical ra window for shmem
>   o add documentation to the new ra algorithm (per Andrew)
> 
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -327,27 +327,14 @@ struct page *read_swap_cache_async(swp_e
>  	return found_page;
>  }
>  
> -/**
> - * swapin_readahead - swap in pages in hope we need them soon
> - * @entry: swap entry of this memory
> - * @gfp_mask: memory allocation flags
> - * @vma: user vma this address belongs to
> - * @addr: target address for mempolicy
> - *
> - * Returns the struct page for entry and addr, after queueing swapin.
> - *
> +/*
>   * Primitive swap readahead code. We simply read an aligned block of
>   * (1 << page_cluster) entries in the swap area. This method is chosen
>   * because it doesn't cost us any seek time.  We also make sure to queue
>   * the 'original' request together with the readahead ones...
> - *
> - * This has been extended to use the NUMA policies from the mm triggering
> - * the readahead.
> - *
> - * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
>   */
> -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> -			struct vm_area_struct *vma, unsigned long addr)
> +static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
> +				struct vm_area_struct *vma, unsigned long addr)
>  {
>  	int nr_pages;
>  	struct page *page;
> @@ -373,3 +360,51 @@ struct page *swapin_readahead(swp_entry_
>  	lru_add_drain();	/* Push any new pages onto the LRU now */
>  	return read_swap_cache_async(entry, gfp_mask, vma, addr);
>  }
> +
> +/**
> + * swapin_readahead - swap in pages in hope we need them soon
> + * @entry: swap entry of this memory
> + * @gfp_mask: memory allocation flags
> + * @vma: user vma this address belongs to
> + * @addr: target address for mempolicy
> + * @entries: swap slots to consider reading
> + * @nr_entries: number of @entries
> + * @cluster: readahead window size in swap slots
> + *
> + * Returns the struct page for entry and addr, after queueing swapin.
> + *
> + * This has been extended to use the NUMA policies from the mm
> + * triggering the readahead.
> + *
> + * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> + */
> +struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> +			struct vm_area_struct *vma, unsigned long addr,
> +			swp_entry_t *entries, int nr_entries,
> +			unsigned long cluster)
> +{
> +	unsigned long pmin, pmax;
> +	int i;
> +
> +	if (!entries)	/* XXX: shmem case */
> +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> +	pmin = swp_offset(entry) & ~(cluster - 1);
> +	pmax = pmin + cluster;
> +	for (i = 0; i < nr_entries; i++) {
> +		swp_entry_t swp = entries[i];
> +		struct page *page;
> +
> +		if (swp_type(swp) != swp_type(entry))
> +			continue;
> +		if (swp_offset(swp) > pmax)
> +			continue;
> +		if (swp_offset(swp) < pmin)
> +			continue;
> +		page = read_swap_cache_async(swp, gfp_mask, vma, addr);
> +		if (!page)
> +			break;
> +		page_cache_release(page);
> +	}
> +	lru_add_drain();	/* Push any new pages onto the LRU now */
> +	return read_swap_cache_async(entry, gfp_mask, vma, addr);
> +}
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -292,7 +292,9 @@ extern struct page *lookup_swap_cache(sw
>  extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
>  			struct vm_area_struct *vma, unsigned long addr);
>  extern struct page *swapin_readahead(swp_entry_t, gfp_t,
> -			struct vm_area_struct *vma, unsigned long addr);
> +			struct vm_area_struct *vma, unsigned long addr,
> +			swp_entry_t *entries, int nr_entries,
> +			unsigned long cluster);
>  
>  /* linux/mm/swapfile.c */
>  extern long nr_swap_pages;
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2440,6 +2440,54 @@ int vmtruncate_range(struct inode *inode
>  }
>  
>  /*
> + * The readahead window is the virtual area around the faulting page,
> + * where the physical proximity of the swap slots is taken into
> + * account as well in swapin_readahead().
> + *
> + * While the swap allocation algorithm tries to keep LRU-related pages
> + * together on the swap backing, it is not reliable on heavy thrashing
> + * systems where concurrent reclaimers allocate swap slots and/or most
> + * anonymous memory pages are already in swap cache.
> + *
> + * On the virtual side, subgroups of VMA-related pages are usually
> + * used together, which gives another hint to LRU relationship.
> + *
> + * By taking both aspects into account, we get a good approximation of
> + * which pages are sensible to read together with the faulting one.
> + */
> +static int swap_readahead_ptes(struct mm_struct *mm,
> +			unsigned long addr, pmd_t *pmd,
> +			swp_entry_t *entries,
> +			unsigned long cluster)
> +{
> +	unsigned long window, min, max, limit;
> +	spinlock_t *ptl;
> +	pte_t *ptep;
> +	int i, nr;
> +
> +	window = cluster << PAGE_SHIFT;
> +	min = addr & ~(window - 1);
> +	max = min + cluster;
> +	/*
> +	 * To keep the locking/highpte mapping simple, stay
> +	 * within the PTE range of one PMD entry.
> +	 */
> +	limit = addr & PMD_MASK;
> +	if (limit > min)
> +		min = limit;
> +	limit = pmd_addr_end(addr, max);
> +	if (limit < max)
> +		max = limit;
> +	limit = max - min;
> +	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
> +	for (i = nr = 0; i < limit; i++)
> +		if (is_swap_pte(ptep[i]))
> +			entries[nr++] = pte_to_swp_entry(ptep[i]);
> +	pte_unmap_unlock(ptep, ptl);
> +	return nr;
> +}
> +
> +/*
>   * We enter with non-exclusive mmap_sem (to exclude vma changes,
>   * but allow concurrent faults), and pte mapped but not yet locked.
>   * We return with mmap_sem still held, but pte unmapped and unlocked.
> @@ -2466,9 +2514,14 @@ static int do_swap_page(struct mm_struct
>  	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
>  	page = lookup_swap_cache(entry);
>  	if (!page) {
> +		int nr, cluster = 1 << page_cluster;
> +		swp_entry_t entries[cluster];
> +
>  		grab_swap_token(); /* Contend for token _before_ read-in */
> +		nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
>  		page = swapin_readahead(entry,
> -					GFP_HIGHUSER_MOVABLE, vma, address);
> +					GFP_HIGHUSER_MOVABLE, vma, address,
> +					entries, nr, cluster);
>  		if (!page) {
>  			/*
>  			 * Back out if somebody else faulted in this pte
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1148,7 +1148,7 @@ static struct page *shmem_swapin(swp_ent
>  	pvma.vm_pgoff = idx;
>  	pvma.vm_ops = NULL;
>  	pvma.vm_policy = spol;
> -	page = swapin_readahead(entry, gfp, &pvma, 0);
> +	page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
>  	return page;
>  }
>  
> @@ -1178,7 +1178,7 @@ static inline void shmem_show_mpol(struc
>  static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
>  			struct shmem_inode_info *info, unsigned long idx)
>  {
> -	return swapin_readahead(entry, gfp, NULL, 0);
> +	return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
>  }
>  
>  static inline struct page *shmem_alloc_page(gfp_t gfp,
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -112,6 +112,8 @@ static int min_percpu_pagelist_fract = 8
>  
>  static int ngroups_max = NGROUPS_MAX;
>  
> +static int page_cluster_max = 5;
> +
>  #ifdef CONFIG_MODULES
>  extern char modprobe_path[];
>  #endif
> @@ -966,7 +968,10 @@ static struct ctl_table vm_table[] = {
>  		.data		= &page_cluster,
>  		.maxlen		= sizeof(int),
>  		.mode		= 0644,
> -		.proc_handler	= &proc_dointvec,
> +		.proc_handler	= &proc_dointvec_minmax,
> +		.strategy	= &sysctl_intvec,
> +		.extra1		= &zero,
> +		.extra2		= &page_cluster_max,
>  	},
>  	{
>  		.ctl_name	= VM_DIRTY_BACKGROUND,
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-09 19:37   ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-09 19:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Hugh Dickins, Andi Kleen, Wu Fengguang,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> [resend with lists cc'd, sorry]

[and fixed Hugh's email.  crap]

> Hi,
> 
> here is a new iteration of the virtual swap readahead.  Per Hugh's
> suggestion, I moved the pte collecting to the callsite and thus out
> ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> array of that many swap entries on the stack, but I think it is better
> to limit the cluster size to a sane maximum than using dynamic
> allocation for this purpose.
> 
> Thanks all for the helpful suggestions.  KAMEZAWA-san and Minchan, I
> didn't incorporate your ideas in this patch as I think they belong in
> a different one with their own justifications.  I didn't ignore them.
> 
>        Hannes
> 
> ---
> The current swap readahead implementation reads a physically
> contiguous group of swap slots around the faulting page to take
> advantage of the disk head's position and in the hope that the
> surrounding pages will be needed soon as well.
> 
> This works as long as the physical swap slot order approximates the
> LRU order decently, otherwise it wastes memory and IO bandwidth to
> read in pages that are unlikely to be needed soon.
> 
> However, the physical swap slot layout diverges from the LRU order
> with increasing swap activity, i.e. high memory pressure situations,
> and this is exactly the situation where swapin should not waste any
> memory or IO bandwidth as both are the most contended resources at
> this point.
> 
> Another approximation for LRU-relation is the VMA order as groups of
> VMA-related pages are usually used together.
> 
> This patch combines both the physical and the virtual hint to get a
> good approximation of pages that are sensible to read ahead.
> 
> When both diverge, we either read unrelated data, seek heavily for
> related data, or, what this patch does, just decrease the readahead
> efforts.
> 
> To achieve this, we have essentially two readahead windows of the same
> size: one spans the virtual, the other one the physical neighborhood
> of the faulting page.  We only read where both areas overlap.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> ---
>  include/linux/swap.h |    4 ++-
>  kernel/sysctl.c      |    7 ++++-
>  mm/memory.c          |   55 +++++++++++++++++++++++++++++++++++++++++
>  mm/shmem.c           |    4 +--
>  mm/swap_state.c      |   67 ++++++++++++++++++++++++++++++++++++++-------------
>  5 files changed, 116 insertions(+), 21 deletions(-)
> 
> version 3:
>   o move pte selection to callee (per Hugh)
>   o limit ra ptes to one pmd entry to avoid multiple
>     locking/mapping of highptes (per Hugh)
> 
> version 2:
>   o fall back to physical ra window for shmem
>   o add documentation to the new ra algorithm (per Andrew)
> 
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -327,27 +327,14 @@ struct page *read_swap_cache_async(swp_e
>  	return found_page;
>  }
>  
> -/**
> - * swapin_readahead - swap in pages in hope we need them soon
> - * @entry: swap entry of this memory
> - * @gfp_mask: memory allocation flags
> - * @vma: user vma this address belongs to
> - * @addr: target address for mempolicy
> - *
> - * Returns the struct page for entry and addr, after queueing swapin.
> - *
> +/*
>   * Primitive swap readahead code. We simply read an aligned block of
>   * (1 << page_cluster) entries in the swap area. This method is chosen
>   * because it doesn't cost us any seek time.  We also make sure to queue
>   * the 'original' request together with the readahead ones...
> - *
> - * This has been extended to use the NUMA policies from the mm triggering
> - * the readahead.
> - *
> - * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
>   */
> -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> -			struct vm_area_struct *vma, unsigned long addr)
> +static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
> +				struct vm_area_struct *vma, unsigned long addr)
>  {
>  	int nr_pages;
>  	struct page *page;
> @@ -373,3 +360,51 @@ struct page *swapin_readahead(swp_entry_
>  	lru_add_drain();	/* Push any new pages onto the LRU now */
>  	return read_swap_cache_async(entry, gfp_mask, vma, addr);
>  }
> +
> +/**
> + * swapin_readahead - swap in pages in hope we need them soon
> + * @entry: swap entry of this memory
> + * @gfp_mask: memory allocation flags
> + * @vma: user vma this address belongs to
> + * @addr: target address for mempolicy
> + * @entries: swap slots to consider reading
> + * @nr_entries: number of @entries
> + * @cluster: readahead window size in swap slots
> + *
> + * Returns the struct page for entry and addr, after queueing swapin.
> + *
> + * This has been extended to use the NUMA policies from the mm
> + * triggering the readahead.
> + *
> + * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> + */
> +struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> +			struct vm_area_struct *vma, unsigned long addr,
> +			swp_entry_t *entries, int nr_entries,
> +			unsigned long cluster)
> +{
> +	unsigned long pmin, pmax;
> +	int i;
> +
> +	if (!entries)	/* XXX: shmem case */
> +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> +	pmin = swp_offset(entry) & ~(cluster - 1);
> +	pmax = pmin + cluster;
> +	for (i = 0; i < nr_entries; i++) {
> +		swp_entry_t swp = entries[i];
> +		struct page *page;
> +
> +		if (swp_type(swp) != swp_type(entry))
> +			continue;
> +		if (swp_offset(swp) > pmax)
> +			continue;
> +		if (swp_offset(swp) < pmin)
> +			continue;
> +		page = read_swap_cache_async(swp, gfp_mask, vma, addr);
> +		if (!page)
> +			break;
> +		page_cache_release(page);
> +	}
> +	lru_add_drain();	/* Push any new pages onto the LRU now */
> +	return read_swap_cache_async(entry, gfp_mask, vma, addr);
> +}
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -292,7 +292,9 @@ extern struct page *lookup_swap_cache(sw
>  extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
>  			struct vm_area_struct *vma, unsigned long addr);
>  extern struct page *swapin_readahead(swp_entry_t, gfp_t,
> -			struct vm_area_struct *vma, unsigned long addr);
> +			struct vm_area_struct *vma, unsigned long addr,
> +			swp_entry_t *entries, int nr_entries,
> +			unsigned long cluster);
>  
>  /* linux/mm/swapfile.c */
>  extern long nr_swap_pages;
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2440,6 +2440,54 @@ int vmtruncate_range(struct inode *inode
>  }
>  
>  /*
> + * The readahead window is the virtual area around the faulting page,
> + * where the physical proximity of the swap slots is taken into
> + * account as well in swapin_readahead().
> + *
> + * While the swap allocation algorithm tries to keep LRU-related pages
> + * together on the swap backing, it is not reliable on heavy thrashing
> + * systems where concurrent reclaimers allocate swap slots and/or most
> + * anonymous memory pages are already in swap cache.
> + *
> + * On the virtual side, subgroups of VMA-related pages are usually
> + * used together, which gives another hint to LRU relationship.
> + *
> + * By taking both aspects into account, we get a good approximation of
> + * which pages are sensible to read together with the faulting one.
> + */
> +static int swap_readahead_ptes(struct mm_struct *mm,
> +			unsigned long addr, pmd_t *pmd,
> +			swp_entry_t *entries,
> +			unsigned long cluster)
> +{
> +	unsigned long window, min, max, limit;
> +	spinlock_t *ptl;
> +	pte_t *ptep;
> +	int i, nr;
> +
> +	window = cluster << PAGE_SHIFT;
> +	min = addr & ~(window - 1);
> +	max = min + cluster;
> +	/*
> +	 * To keep the locking/highpte mapping simple, stay
> +	 * within the PTE range of one PMD entry.
> +	 */
> +	limit = addr & PMD_MASK;
> +	if (limit > min)
> +		min = limit;
> +	limit = pmd_addr_end(addr, max);
> +	if (limit < max)
> +		max = limit;
> +	limit = max - min;
> +	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
> +	for (i = nr = 0; i < limit; i++)
> +		if (is_swap_pte(ptep[i]))
> +			entries[nr++] = pte_to_swp_entry(ptep[i]);
> +	pte_unmap_unlock(ptep, ptl);
> +	return nr;
> +}
> +
> +/*
>   * We enter with non-exclusive mmap_sem (to exclude vma changes,
>   * but allow concurrent faults), and pte mapped but not yet locked.
>   * We return with mmap_sem still held, but pte unmapped and unlocked.
> @@ -2466,9 +2514,14 @@ static int do_swap_page(struct mm_struct
>  	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
>  	page = lookup_swap_cache(entry);
>  	if (!page) {
> +		int nr, cluster = 1 << page_cluster;
> +		swp_entry_t entries[cluster];
> +
>  		grab_swap_token(); /* Contend for token _before_ read-in */
> +		nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
>  		page = swapin_readahead(entry,
> -					GFP_HIGHUSER_MOVABLE, vma, address);
> +					GFP_HIGHUSER_MOVABLE, vma, address,
> +					entries, nr, cluster);
>  		if (!page) {
>  			/*
>  			 * Back out if somebody else faulted in this pte
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1148,7 +1148,7 @@ static struct page *shmem_swapin(swp_ent
>  	pvma.vm_pgoff = idx;
>  	pvma.vm_ops = NULL;
>  	pvma.vm_policy = spol;
> -	page = swapin_readahead(entry, gfp, &pvma, 0);
> +	page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
>  	return page;
>  }
>  
> @@ -1178,7 +1178,7 @@ static inline void shmem_show_mpol(struc
>  static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
>  			struct shmem_inode_info *info, unsigned long idx)
>  {
> -	return swapin_readahead(entry, gfp, NULL, 0);
> +	return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
>  }
>  
>  static inline struct page *shmem_alloc_page(gfp_t gfp,
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -112,6 +112,8 @@ static int min_percpu_pagelist_fract = 8
>  
>  static int ngroups_max = NGROUPS_MAX;
>  
> +static int page_cluster_max = 5;
> +
>  #ifdef CONFIG_MODULES
>  extern char modprobe_path[];
>  #endif
> @@ -966,7 +968,10 @@ static struct ctl_table vm_table[] = {
>  		.data		= &page_cluster,
>  		.maxlen		= sizeof(int),
>  		.mode		= 0644,
> -		.proc_handler	= &proc_dointvec,
> +		.proc_handler	= &proc_dointvec_minmax,
> +		.strategy	= &sysctl_intvec,
> +		.extra1		= &zero,
> +		.extra2		= &page_cluster_max,
>  	},
>  	{
>  		.ctl_name	= VM_DIRTY_BACKGROUND,
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-09 19:37   ` Johannes Weiner
@ 2009-06-10  5:03     ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10  5:03 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > [resend with lists cc'd, sorry]
> 
> [and fixed Hugh's email.  crap]
> 
> > Hi,
> > 
> > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > suggestion, I moved the pte collecting to the callsite and thus out
> > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > array of that many swap entries on the stack, but I think it is better
> > to limit the cluster size to a sane maximum than using dynamic
> > allocation for this purpose.

Hi Johannes,

When stress testing your patch, I found it triggered many OOM kills.
Around the time of last OOMs, the memory usage is:

             total       used       free     shared    buffers     cached
Mem:           474        468          5          0          0        239
-/+ buffers/cache:        229        244
Swap:         1023        221        802

Thanks,
Fengguang
---

full kernel log:

[  472.528487] /usr/games/glch invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  472.537228] Pid: 4361, comm: /usr/games/glch Not tainted 2.6.30-rc8-mm1 #301
[  472.544293] Call Trace:
[  472.546762]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  472.552259]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  472.558010]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  472.563250]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  472.568991]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  472.574499]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  472.580858]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  472.586871]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  472.592614]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  472.599222]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  472.605926]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  472.610987]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  472.616558]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  472.621786]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  472.627874]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  472.633658]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  472.639258]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  472.644413] Mem-Info:
[  472.646698] Node 0 DMA per-cpu:
[  472.649855] CPU    0: hi:    0, btch:   1 usd:   0
[  472.654649] CPU    1: hi:    0, btch:   1 usd:   0
[  472.659439] Node 0 DMA32 per-cpu:
[  472.662774] CPU    0: hi:  186, btch:  31 usd: 114
[  472.667560] CPU    1: hi:  186, btch:  31 usd:  81
[  472.672350] Active_anon:43340 active_file:774 inactive_anon:46297
[  472.672351]  inactive_file:2095 unevictable:4 dirty:0 writeback:0 unstable:0
[  472.672352]  free:1334 slab:13888 mapped:3528 pagetables:7580 bounce:0
[  472.692012] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:4892kB inactive_anon:6200kB active_file:12kB inactive_file:172kB unevictable:0kB present:15164kB pages_scanned:6752 all_unreclaimable? no
[  472.711031] lowmem_reserve[]: 0 483 483 483
[  472.715313] Node 0 DMA32 free:3320kB min:2768kB low:3460kB high:4152kB active_anon:168468kB inactive_anon:179064kB active_file:3084kB inactive_file:8208kB unevictable:16kB present:495008kB pages_scanned:265856 all_unreclaimable? no
[  472.735793] lowmem_reserve[]: 0 0 0 0
[  472.739546] Node 0 DMA: 21*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[  472.750386] Node 0 DMA32: 220*4kB 23*8kB 17*16kB 14*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 3320kB
[  472.761754] 63776 total pagecache pages
[  472.765589] 9263 pages in swap cache
[  472.769162] Swap cache stats: add 166054, delete 156791, find 14174/51560
[  472.775943] Free swap  = 689708kB
[  472.779264] Total swap = 1048568kB
[  472.786832] 131072 pages RAM
[  472.789713] 9628 pages reserved
[  472.792861] 86958 pages shared
[  472.795921] 56805 pages non-shared
[  472.799325] Out of memory: kill process 3514 (run-many-x-apps) score 1495085 or a child
[  472.807327] Killed process 3516 (xeyes)
[  473.861300] gnobots2 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  473.868615] Pid: 4533, comm: gnobots2 Not tainted 2.6.30-rc8-mm1 #301
[  473.875196] Call Trace:
[  473.877669]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  473.883155]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  473.888919]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  473.894141]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  473.899881]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  473.905362]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  473.911711]  [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[  473.917276]  [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[  473.923451]  [<ffffffff810e9de5>] swapin_readahead+0xb5/0x170
[  473.929194]  [<ffffffff810dac5d>] do_swap_page+0x3fd/0x500
[  473.934677]  [<ffffffff810e9913>] ? lookup_swap_cache+0x13/0x30
[  473.940585]  [<ffffffff810da8da>] ? do_swap_page+0x7a/0x500
[  473.946152]  [<ffffffff810dc70e>] handle_mm_fault+0x44e/0x500
[  473.951898]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  473.957464]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  473.962601] Mem-Info:
[  473.964870] Node 0 DMA per-cpu:
[  473.968036] CPU    0: hi:    0, btch:   1 usd:   0
[  473.972818] CPU    1: hi:    0, btch:   1 usd:   0
[  473.977601] Node 0 DMA32 per-cpu:
[  473.980930] CPU    0: hi:  186, btch:  31 usd:  78
[  473.985718] CPU    1: hi:  186, btch:  31 usd:  79
[  473.990512] Active_anon:43366 active_file:728 inactive_anon:46639
[  473.990513]  inactive_file:2442 unevictable:4 dirty:0 writeback:0 unstable:0
[  473.990515]  free:1187 slab:13677 mapped:3344 pagetables:7560 bounce:0
[  474.010136] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:4872kB inactive_anon:6360kB active_file:28kB inactive_file:96kB unevictable:0kB present:15164kB pages_scanned:15568 all_unreclaimable? no
[  474.029143] lowmem_reserve[]: 0 483 483 483
[  474.033403] Node 0 DMA32 free:2740kB min:2768kB low:3460kB high:4152kB active_anon:168592kB inactive_anon:180308kB active_file:2884kB inactive_file:9672kB unevictable:16kB present:495008kB pages_scanned:627904 all_unreclaimable? yes
[  474.053974] lowmem_reserve[]: 0 0 0 0
[  474.057721] Node 0 DMA: 16*4kB 3*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[  474.068556] Node 0 DMA32: 105*4kB 6*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2740kB
[  474.079825] 64075 total pagecache pages
[  474.083660] 9277 pages in swap cache
[  474.087235] Swap cache stats: add 166129, delete 156852, find 14175/51619
[  474.094011] Free swap  = 690168kB
[  474.097327] Total swap = 1048568kB
[  474.104333] 131072 pages RAM
[  474.107225] 9628 pages reserved
[  474.110363] 84659 pages shared
[  474.113409] 57530 pages non-shared
[  474.116816] Out of memory: kill process 3514 (run-many-x-apps) score 1490267 or a child
[  474.124811] Killed process 3593 (gthumb)
[  480.443446] gnome-network-p invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  480.451749] Pid: 5242, comm: gnome-network-p Not tainted 2.6.30-rc8-mm1 #301
[  480.458883] Call Trace:
[  480.461362]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  480.467248]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  480.473025]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  480.478294]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  480.484050]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  480.489546]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  480.495920]  [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[  480.501509]  [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[  480.507718]  [<ffffffff810e9de5>] swapin_readahead+0xb5/0x170
[  480.513477]  [<ffffffff810dac5d>] do_swap_page+0x3fd/0x500
[  480.518982]  [<ffffffff810e9913>] ? lookup_swap_cache+0x13/0x30
[  480.524917]  [<ffffffff810da8da>] ? do_swap_page+0x7a/0x500
[  480.530515]  [<ffffffff810dc70e>] handle_mm_fault+0x44e/0x500
[  480.536273]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  480.541865]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  480.547023] Mem-Info:
[  480.549305] Node 0 DMA per-cpu:
[  480.552485] CPU    0: hi:    0, btch:   1 usd:   0
[  480.557293] CPU    1: hi:    0, btch:   1 usd:   0
[  480.562106] Node 0 DMA32 per-cpu:
[  480.565450] CPU    0: hi:  186, btch:  31 usd: 166
[  480.570260] CPU    1: hi:  186, btch:  31 usd:  54
[  480.575072] Active_anon:43200 active_file:1328 inactive_anon:46633
[  480.575077]  inactive_file:2266 unevictable:4 dirty:0 writeback:0 unstable:0
[  480.575081]  free:1175 slab:13522 mapped:4094 pagetables:7430 bounce:0
[  480.594826] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5048kB inactive_anon:6228kB active_file:24kB inactive_file:92kB unevictable:0kB present:15164kB pages_scanned:20576 all_unreclaimable? yes
[  480.613968] lowmem_reserve[]: 0 483 483 483
[  480.618302] Node 0 DMA32 free:2696kB min:2768kB low:3460kB high:4152kB active_anon:167804kB inactive_anon:180304kB active_file:5324kB inactive_file:9012kB unevictable:16kB present:495008kB pages_scanned:698592 all_unreclaimable? yes
[  480.638902] lowmem_reserve[]: 0 0 0 0
[  480.642709] Node 0 DMA: 15*4kB 1*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1988kB
[  480.653661] Node 0 DMA32: 100*4kB 5*8kB 15*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2696kB
[  480.665062] 64296 total pagecache pages
[  480.668909] 9027 pages in swap cache
[  480.672486] Swap cache stats: add 166520, delete 157493, find 14190/51963
[  480.679265] Free swap  = 697604kB
[  480.682590] Total swap = 1048568kB
[  480.692920] 131072 pages RAM
[  480.695835] 9628 pages reserved
[  480.698989] 83496 pages shared
[  480.702055] 56997 pages non-shared
[  480.705460] Out of memory: kill process 3514 (run-many-x-apps) score 1233725 or a child
[  480.713480] Killed process 3620 (gedit)
[  485.239788] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  485.247180] Pid: 3407, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[  485.253879] Call Trace:
[  485.256340]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  485.261825]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  485.267587]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  485.272810]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  485.278556]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  485.284034]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  485.290383]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  485.296384]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  485.302127]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  485.308729]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  485.315421]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  485.320471]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  485.326044]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  485.331264]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  485.337348]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  485.343091]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  485.348660]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  485.353794] Mem-Info:
[  485.356074] Node 0 DMA per-cpu:
[  485.359238] CPU    0: hi:    0, btch:   1 usd:   0
[  485.364022] CPU    1: hi:    0, btch:   1 usd:   0
[  485.368805] Node 0 DMA32 per-cpu:
[  485.372130] CPU    0: hi:  186, btch:  31 usd:  86
[  485.376917] CPU    1: hi:  186, btch:  31 usd:  65
[  485.381704] Active_anon:43069 active_file:1343 inactive_anon:46566
[  485.381705]  inactive_file:2264 unevictable:4 dirty:0 writeback:0 unstable:0
[  485.381706]  free:1177 slab:13765 mapped:3976 pagetables:7336 bounce:0
[  485.401416] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5096kB inactive_anon:6228kB active_file:24kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:14624 all_unreclaimable? no
[  485.420352] lowmem_reserve[]: 0 483 483 483
[  485.424627] Node 0 DMA32 free:2708kB min:2768kB low:3460kB high:4152kB active_anon:167180kB inactive_anon:180036kB active_file:5348kB inactive_file:9072kB unevictable:16kB present:495008kB pages_scanned:700592 all_unreclaimable? yes
[  485.445209] lowmem_reserve[]: 0 0 0 0
[  485.448983] Node 0 DMA: 25*4kB 1*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  485.459812] Node 0 DMA32: 97*4kB 8*8kB 15*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2708kB
[  485.470995] 64132 total pagecache pages
[  485.474826] 8910 pages in swap cache
[  485.478397] Swap cache stats: add 166970, delete 158060, find 14213/52337
[  485.485171] Free swap  = 704464kB
[  485.488481] Total swap = 1048568kB
[  485.495505] 131072 pages RAM
[  485.498400] 9628 pages reserved
[  485.501539] 80730 pages shared
[  485.504593] 57330 pages non-shared
[  485.507994] Out of memory: kill process 3514 (run-many-x-apps) score 1208843 or a child
[  485.515986] Killed process 3653 (xpdf.bin)
[  487.520227] blackjack invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  487.527723] Pid: 4579, comm: blackjack Not tainted 2.6.30-rc8-mm1 #301
[  487.534650] Call Trace:
[  487.537290]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  487.542782]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  487.548533]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  487.553767]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  487.559522]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  487.565003]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  487.571353]  [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[  487.576933]  [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[  487.583117]  [<ffffffff810e9e19>] swapin_readahead+0xe9/0x170
[  487.588860]  [<ffffffff810d1167>] shmem_getpage+0x607/0x970
[  487.594432]  [<ffffffff810a9c8b>] ? delayacct_end+0x6b/0xa0
[  487.600003]  [<ffffffff810a9caa>] ? delayacct_end+0x8a/0xa0
[  487.605571]  [<ffffffff810a9d2f>] ? __delayacct_blkio_end+0x2f/0x50
[  487.611837]  [<ffffffff81542132>] ? io_schedule+0x82/0xb0
[  487.617229]  [<ffffffff8107ca35>] ? print_lock_contention_bug+0x25/0x120
[  487.623927]  [<ffffffff810c0970>] ? sync_page+0x0/0x80
[  487.629060]  [<ffffffff810c0700>] ? find_get_page+0x0/0x110
[  487.634633]  [<ffffffff81052702>] ? current_fs_time+0x22/0x30
[  487.640372]  [<ffffffff810d9983>] ? __do_fault+0x153/0x510
[  487.645849]  [<ffffffff8107ca35>] ? print_lock_contention_bug+0x25/0x120
[  487.652542]  [<ffffffff810d151a>] shmem_fault+0x4a/0x80
[  487.657762]  [<ffffffff812444a9>] shm_fault+0x19/0x20
[  487.662819]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  487.668036]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  487.674125]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  487.679867]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  487.685434]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  487.690570] Mem-Info:
[  487.692836] Node 0 DMA per-cpu:
[  487.696003] CPU    0: hi:    0, btch:   1 usd:   0
[  487.700790] CPU    1: hi:    0, btch:   1 usd:   0
[  487.705578] Node 0 DMA32 per-cpu:
[  487.708906] CPU    0: hi:  186, btch:  31 usd: 142
[  487.713698] CPU    1: hi:  186, btch:  31 usd:  77
[  487.718498] Active_anon:42533 active_file:677 inactive_anon:46561
[  487.718499]  inactive_file:3214 unevictable:4 dirty:0 writeback:0 unstable:0
[  487.718500]  free:1573 slab:13680 mapped:3351 pagetables:7308 bounce:0
[  487.738125] Node 0 DMA free:2064kB min:84kB low:104kB high:124kB active_anon:5152kB inactive_anon:6328kB active_file:8kB inactive_file:92kB unevictable:0kB present:15164kB pages_scanned:1586 all_unreclaimable? no
[  487.756958] lowmem_reserve[]: 0 483 483 483
[  487.761221] Node 0 DMA32 free:4228kB min:2768kB low:3460kB high:4152kB active_anon:164980kB inactive_anon:180068kB active_file:2700kB inactive_file:12764kB unevictable:16kB present:495008kB pages_scanned:42720 all_unreclaimable? no
[  487.781711] lowmem_reserve[]: 0 0 0 0
[  487.785458] Node 0 DMA: 37*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2068kB
[  487.796294] Node 0 DMA32: 271*4kB 105*8kB 16*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 4228kB
[  487.807722] 64270 total pagecache pages
[  487.811557] 8728 pages in swap cache
[  487.815132] Swap cache stats: add 167087, delete 158359, find 14218/52435
[  487.821908] Free swap  = 711028kB
[  487.825220] Total swap = 1048568kB
[  487.832277] 131072 pages RAM
[  487.835178] 9628 pages reserved
[  487.838317] 76338 pages shared
[  487.841364] 57425 pages non-shared
[  487.844768] Out of memory: kill process 3514 (run-many-x-apps) score 1201219 or a child
[  487.852761] Killed process 3696 (xterm)
[  487.857092] tty_ldisc_deref: no references.
[  489.747066] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  489.754480] Pid: 5404, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[  489.761179] Call Trace:
[  489.763640]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  489.769123]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  489.774870]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  489.780090]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  489.785830]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  489.791315]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  489.797665]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  489.803672]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  489.809409]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  489.816020]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  489.822723]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  489.827771]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  489.833338]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  489.838565]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  489.844653]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  489.850404]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  489.855970]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  489.861101] Mem-Info:
[  489.863375] Node 0 DMA per-cpu:
[  489.866538] CPU    0: hi:    0, btch:   1 usd:   0
[  489.871327] CPU    1: hi:    0, btch:   1 usd:   0
[  489.876114] Node 0 DMA32 per-cpu:
[  489.879450] CPU    0: hi:  186, btch:  31 usd: 139
[  489.884235] CPU    1: hi:  186, btch:  31 usd: 168
[  489.889020] Active_anon:42548 active_file:713 inactive_anon:46654
[  489.889022]  inactive_file:3551 unevictable:4 dirty:0 writeback:0 unstable:0
[  489.889023]  free:1191 slab:13619 mapped:3463 pagetables:7277 bounce:0
[  489.908648] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5156kB inactive_anon:6324kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:18048 all_unreclaimable? yes
[  489.927583] lowmem_reserve[]: 0 483 483 483
[  489.931852] Node 0 DMA32 free:2764kB min:2768kB low:3460kB high:4152kB active_anon:165036kB inactive_anon:180292kB active_file:2852kB inactive_file:14204kB unevictable:16kB present:495008kB pages_scanned:598624 all_unreclaimable? yes
[  489.952505] lowmem_reserve[]: 0 0 0 0
[  489.956255] Node 0 DMA: 24*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  489.967104] Node 0 DMA32: 67*4kB 16*8kB 20*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2764kB
[  489.978371] 64571 total pagecache pages
[  489.982209] 8716 pages in swap cache
[  489.985779] Swap cache stats: add 167160, delete 158444, find 14228/52496
[  489.992561] Free swap  = 712436kB
[  489.995878] Total swap = 1048568kB
[  490.003023] 131072 pages RAM
[  490.005917] 9628 pages reserved
[  490.009051] 77164 pages shared
[  490.012111] 57863 pages non-shared
[  490.015516] Out of memory: kill process 3514 (run-many-x-apps) score 1193943 or a child
[  490.023514] Killed process 3789 (gnome-terminal)
[  490.042359] gnome-terminal invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  490.050059] Pid: 3817, comm: gnome-terminal Not tainted 2.6.30-rc8-mm1 #301
[  490.057019] Call Trace:
[  490.059490]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  490.064986]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  490.070743]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  490.075981]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  490.081738]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  490.087245]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  490.093606]  [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[  490.099200]  [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[  490.105390]  [<ffffffff810e9de5>] swapin_readahead+0xb5/0x170
[  490.111157]  [<ffffffff810dac5d>] do_swap_page+0x3fd/0x500
[  490.116651]  [<ffffffff810e9913>] ? lookup_swap_cache+0x13/0x30
[  490.122581]  [<ffffffff810da8da>] ? do_swap_page+0x7a/0x500
[  490.128166]  [<ffffffff810dc70e>] handle_mm_fault+0x44e/0x500
[  490.133932]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  490.139510]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  490.144658]  [<ffffffff8127600c>] ? __get_user_8+0x1c/0x23
[  490.150157]  [<ffffffff810806ad>] ? exit_robust_list+0x5d/0x160
[  490.156088]  [<ffffffff81077c4d>] ? trace_hardirqs_off+0xd/0x10
[  490.162026]  [<ffffffff81544f97>] ? _spin_unlock_irqrestore+0x67/0x70
[  490.168473]  [<ffffffff8104ae5d>] mm_release+0xed/0x100
[  490.173707]  [<ffffffff8104f653>] exit_mm+0x23/0x150
[  490.178684]  [<ffffffff81544f1b>] ? _spin_unlock_irq+0x2b/0x40
[  490.184528]  [<ffffffff81051208>] do_exit+0x138/0x880
[  490.189593]  [<ffffffff8105e757>] ? get_signal_to_deliver+0x67/0x430
[  490.195967]  [<ffffffff81051998>] do_group_exit+0x48/0xd0
[  490.201373]  [<ffffffff8105e9d4>] get_signal_to_deliver+0x2e4/0x430
[  490.207653]  [<ffffffff8100b332>] do_notify_resume+0xc2/0x820
[  490.213410]  [<ffffffff81012859>] ? sched_clock+0x9/0x10
[  490.218743]  [<ffffffff81077c85>] ? lock_release_holdtime+0x35/0x1c0
[  490.225102]  [<ffffffff810fd768>] ? vfs_read+0xc8/0x1a0
[  490.230340]  [<ffffffff8100c057>] sysret_signal+0x83/0xd9
[  490.235750] Mem-Info:
[  490.238041] Node 0 DMA per-cpu:
[  490.241213] CPU    0: hi:    0, btch:   1 usd:   0
[  490.246023] CPU    1: hi:    0, btch:   1 usd:   0
[  490.250817] Node 0 DMA32 per-cpu:
[  490.254173] CPU    0: hi:  186, btch:  31 usd: 139
[  490.258976] CPU    1: hi:  186, btch:  31 usd: 169
[  490.263781] Active_anon:42548 active_file:713 inactive_anon:46660
[  490.263784]  inactive_file:3551 unevictable:4 dirty:0 writeback:0 unstable:0
[  490.263787]  free:1191 slab:13619 mapped:3463 pagetables:7277 bounce:0
[  490.283433] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5156kB inactive_anon:6324kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:18048 all_unreclaimable? yes
[  490.302379] lowmem_reserve[]: 0 483 483 483
[  490.306699] Node 0 DMA32 free:2764kB min:2768kB low:3460kB high:4152kB active_anon:165036kB inactive_anon:180316kB active_file:2852kB inactive_file:14204kB unevictable:16kB present:495008kB pages_scanned:616288 all_unreclaimable? yes
[  490.327380] lowmem_reserve[]: 0 0 0 0
[  490.331178] Node 0 DMA: 24*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  490.342134] Node 0 DMA32: 67*4kB 16*8kB 20*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2764kB
[  490.353506] 64571 total pagecache pages
[  490.357357] 8716 pages in swap cache
[  490.360943] Swap cache stats: add 167160, delete 158444, find 14228/52497
[  490.367735] Free swap  = 712436kB
[  490.371063] Total swap = 1048568kB
[  490.381335] 131072 pages RAM
[  490.384247] 9628 pages reserved
[  490.387398] 77163 pages shared
[  490.390461] 57864 pages non-shared
[  491.721918] tty_ldisc_deref: no references.
[  507.974133] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  507.981095] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[  507.987465] Call Trace:
[  507.990171]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  507.995670]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  508.001413]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  508.006640]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  508.012378]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  508.017857]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  508.024207]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  508.030211]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  508.035951]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  508.042555]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  508.049248]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  508.054298]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  508.059864]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  508.065082]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  508.071170]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  508.076916]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  508.082488]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  508.087617] Mem-Info:
[  508.089890] Node 0 DMA per-cpu:
[  508.093045] CPU    0: hi:    0, btch:   1 usd:   0
[  508.097831] CPU    1: hi:    0, btch:   1 usd:   0
[  508.102618] Node 0 DMA32 per-cpu:
[  508.105949] CPU    0: hi:  186, btch:  31 usd:  70
[  508.110732] CPU    1: hi:  186, btch:  31 usd:  35
[  508.115518] Active_anon:43375 active_file:1606 inactive_anon:46595
[  508.115519]  inactive_file:2431 unevictable:4 dirty:0 writeback:0 unstable:0
[  508.115520]  free:1171 slab:13500 mapped:4464 pagetables:7137 bounce:0
[  508.135223] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5372kB inactive_anon:6304kB active_file:48kB inactive_file:152kB unevictable:0kB present:15164kB pages_scanned:18016 all_unreclaimable? yes
[  508.154402] lowmem_reserve[]: 0 483 483 483
[  508.158670] Node 0 DMA32 free:2684kB min:2768kB low:3460kB high:4152kB active_anon:168128kB inactive_anon:180076kB active_file:6376kB inactive_file:9572kB unevictable:16kB present:495008kB pages_scanned:574528 all_unreclaimable? yes
[  508.179230] lowmem_reserve[]: 0 0 0 0
[  508.182977] Node 0 DMA: 20*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[  508.193806] Node 0 DMA32: 81*4kB 9*8kB 17*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2684kB
[  508.204972] 64466 total pagecache pages
[  508.208804] 8648 pages in swap cache
[  508.212374] Swap cache stats: add 169110, delete 160462, find 14531/53889
[  508.219151] Free swap  = 723636kB
[  508.222465] Total swap = 1048568kB
[  508.229465] 131072 pages RAM
[  508.232364] 9628 pages reserved
[  508.235504] 80834 pages shared
[  508.238558] 57150 pages non-shared
[  508.241961] Out of memory: kill process 3514 (run-many-x-apps) score 1142844 or a child
[  508.249954] Killed process 3828 (urxvt)
[  508.254826] tty_ldisc_deref: no references.
[  518.644007] /usr/games/gnom invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  518.652048] Pid: 4284, comm: /usr/games/gnom Not tainted 2.6.30-rc8-mm1 #301
[  518.659110] Call Trace:
[  518.661572]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  518.667060]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  518.672805]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  518.678036]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  518.683779]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  518.689265]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  518.695629]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  518.701648]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  518.707396]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  518.714015]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  518.720728]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  518.725782]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  518.731376]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  518.736610]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  518.742724]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  518.748470]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  518.754050]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  518.759186] Mem-Info:
[  518.761457] Node 0 DMA per-cpu:
[  518.764622] CPU    0: hi:    0, btch:   1 usd:   0
[  518.769433] CPU    1: hi:    0, btch:   1 usd:   0
[  518.774250] Node 0 DMA32 per-cpu:
[  518.777607] CPU    0: hi:  186, btch:  31 usd: 122
[  518.782429] CPU    1: hi:  186, btch:  31 usd: 140
[  518.787320] Active_anon:43558 active_file:800 inactive_anon:46596
[  518.787322]  inactive_file:3200 unevictable:4 dirty:0 writeback:1 unstable:0
[  518.787324]  free:1170 slab:13276 mapped:3632 pagetables:7067 bounce:0
[  518.806969] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5392kB inactive_anon:6284kB active_file:8kB inactive_file:192kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  518.825631] lowmem_reserve[]: 0 483 483 483
[  518.829894] Node 0 DMA32 free:2676kB min:2768kB low:3460kB high:4152kB active_anon:168840kB inactive_anon:180100kB active_file:3192kB inactive_file:12608kB unevictable:16kB present:495008kB pages_scanned:2752 all_unreclaimable? no
[  518.850287] lowmem_reserve[]: 0 0 0 0
[  518.854034] Node 0 DMA: 17*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2004kB
[  518.864860] Node 0 DMA32: 51*4kB 9*8kB 22*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2676kB
[  518.876047] 64523 total pagecache pages
[  518.879879] 8754 pages in swap cache
[  518.883453] Swap cache stats: add 169415, delete 160661, find 14593/54101
[  518.890231] Free swap  = 727320kB
[  518.893549] Total swap = 1048568kB
[  518.900474] 131072 pages RAM
[  518.903375] 9628 pages reserved
[  518.906522] 75910 pages shared
[  518.909579] 57545 pages non-shared
[  518.912975] Out of memory: kill process 3514 (run-many-x-apps) score 1125494 or a child
[  518.920971] Killed process 3913 (gnome-system-mo)
[  664.508168] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  664.514995] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[  664.521111] Call Trace:
[  664.523568]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  664.529049]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  664.534794]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  664.540021]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  664.545757]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  664.551235]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  664.557591]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  664.563593]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  664.569336]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  664.575947]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  664.582648]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  664.587710]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  664.593282]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  664.598508]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  664.604603]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  664.610357]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  664.615937]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  664.621071] Mem-Info:
[  664.623341] Node 0 DMA per-cpu:
[  664.626517] CPU    0: hi:    0, btch:   1 usd:   0
[  664.631305] CPU    1: hi:    0, btch:   1 usd:   0
[  664.636096] Node 0 DMA32 per-cpu:
[  664.639430] CPU    0: hi:  186, btch:  31 usd: 108
[  664.644229] CPU    1: hi:  186, btch:  31 usd: 104
[  664.649022] Active_anon:42958 active_file:868 inactive_anon:46862
[  664.649024]  inactive_file:3541 unevictable:4 dirty:0 writeback:0 unstable:0
[  664.649026]  free:1182 slab:13288 mapped:3904 pagetables:7002 bounce:0
[  664.668657] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5528kB inactive_anon:6256kB active_file:0kB inactive_file:56kB unevictable:0kB present:15164kB pages_scanned:17829 all_unreclaimable? yes
[  664.687670] lowmem_reserve[]: 0 483 483 483
[  664.691974] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:166304kB inactive_anon:181192kB active_file:3472kB inactive_file:14108kB unevictable:16kB present:495008kB pages_scanned:561984 all_unreclaimable? yes
[  664.712637] lowmem_reserve[]: 0 0 0 0
[  664.716412] Node 0 DMA: 21*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[  664.727297] Node 0 DMA32: 83*4kB 9*8kB 17*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2724kB
[  664.738494] 64381 total pagecache pages
[  664.742329] 7902 pages in swap cache
[  664.745909] Swap cache stats: add 174458, delete 166556, find 14826/56928
[  664.752696] Free swap  = 734732kB
[  664.756012] Total swap = 1048568kB
[  664.763953] 131072 pages RAM
[  664.766845] 9628 pages reserved
[  664.769992] 74903 pages shared
[  664.773047] 58244 pages non-shared
[  664.776465] Out of memory: kill process 3514 (run-many-x-apps) score 1094818 or a child
[  664.784464] Killed process 3941 (gnome-help)
[  700.167781] Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[  700.174355] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[  700.180473] Call Trace:
[  700.182949]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  700.188480]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  700.194247]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  700.199501]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  700.205257]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  700.210748]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  700.217115]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  700.223132]  [<ffffffff810c73f9>] __get_free_pages+0x9/0x50
[  700.228731]  [<ffffffff8110e3c2>] __pollwait+0xc2/0x100
[  700.233966]  [<ffffffff814958c3>] unix_poll+0x23/0xc0
[  700.239025]  [<ffffffff81419a88>] sock_poll+0x18/0x20
[  700.244095]  [<ffffffff8110d969>] do_select+0x3e9/0x730
[  700.249333]  [<ffffffff8110d580>] ? do_select+0x0/0x730
[  700.254575]  [<ffffffff8110e300>] ? __pollwait+0x0/0x100
[  700.259909]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.264976]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.270034]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.275093]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.280157]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.285223]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.290287]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.295360]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.300416]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.305475]  [<ffffffff8110deaf>] core_sys_select+0x1ff/0x330
[  700.311225]  [<ffffffff8110dcf8>] ? core_sys_select+0x48/0x330
[  700.317068]  [<ffffffffa014954c>] ? i915_gem_throttle_ioctl+0x4c/0x60 [i915]
[  700.324109]  [<ffffffff810fcf9a>] ? do_readv_writev+0x16a/0x1f0
[  700.330037]  [<ffffffff810706bc>] ? getnstimeofday+0x5c/0xf0
[  700.335708]  [<ffffffff8106aca9>] ? ktime_get_ts+0x59/0x60
[  700.341207]  [<ffffffff8110e23a>] sys_select+0x4a/0x110
[  700.346450]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  700.352471] Mem-Info:
[  700.354744] Node 0 DMA per-cpu:
[  700.357931] CPU    0: hi:    0, btch:   1 usd:   0
[  700.362728] CPU    1: hi:    0, btch:   1 usd:   0
[  700.367528] Node 0 DMA32 per-cpu:
[  700.370869] CPU    0: hi:  186, btch:  31 usd: 124
[  700.375681] CPU    1: hi:  186, btch:  31 usd: 109
[  700.380485] Active_anon:42750 active_file:1211 inactive_anon:46836
[  700.380487]  inactive_file:3834 unevictable:4 dirty:0 writeback:0 unstable:0
[  700.380490]  free:1185 slab:13047 mapped:4269 pagetables:6879 bounce:0
[  700.400224] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5504kB inactive_anon:6244kB active_file:4kB inactive_file:20kB unevictable:0kB present:15164kB pages_scanned:21160 all_unreclaimable? no
[  700.419171] lowmem_reserve[]: 0 483 483 483
[  700.423495] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:165496kB inactive_anon:181100kB active_file:4840kB inactive_file:15316kB unevictable:16kB present:495008kB pages_scanned:749440 all_unreclaimable? yes
[  700.444177] lowmem_reserve[]: 0 0 0 0
[  700.447982] Node 0 DMA: 24*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  700.458919] Node 0 DMA32: 95*4kB 7*8kB 15*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2724kB
[  700.470109] 64769 total pagecache pages
[  700.473944] 7685 pages in swap cache
[  700.477521] Swap cache stats: add 174858, delete 167173, find 14884/57219
[  700.484305] Free swap  = 756796kB
[  700.487619] Total swap = 1048568kB
[  700.495533] 131072 pages RAM
[  700.498435] 9628 pages reserved
[  700.501585] 75677 pages shared
[  700.504647] 57992 pages non-shared
[  700.508062] Out of memory: kill process 3514 (run-many-x-apps) score 920259 or a child
[  700.515981] Killed process 3972 (gnome-dictionar)
[  772.754850] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  772.762316] Pid: 3363, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[  772.769042] Call Trace:
[  772.771532]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  772.777056]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  772.782830]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  772.788093]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  772.793861]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  772.799371]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  772.805903]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  772.812044]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  772.817979]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  772.824833]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  772.831934]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  772.837201]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  772.843077]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  772.848298]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  772.854384]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  772.860126]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  772.865693]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  772.870831] Mem-Info:
[  772.873099] Node 0 DMA per-cpu:
[  772.876268] CPU    0: hi:    0, btch:   1 usd:   0
[  772.881052] CPU    1: hi:    0, btch:   1 usd:   0
[  772.885837] Node 0 DMA32 per-cpu:
[  772.889177] CPU    0: hi:  186, btch:  31 usd: 119
[  772.893970] CPU    1: hi:  186, btch:  31 usd: 131
[  772.898771] Active_anon:42925 active_file:967 inactive_anon:46822
[  772.898773]  inactive_file:3951 unevictable:4 dirty:0 writeback:0 unstable:0
[  772.898775]  free:1195 slab:13130 mapped:4261 pagetables:6775 bounce:0
[  772.918425] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5572kB inactive_anon:6228kB active_file:0kB inactive_file:28kB unevictable:0kB present:15164kB pages_scanned:1152 all_unreclaimable? no
[  772.937282] lowmem_reserve[]: 0 483 483 483
[  772.941583] Node 0 DMA32 free:2780kB min:2768kB low:3460kB high:4152kB active_anon:166128kB inactive_anon:181060kB active_file:3868kB inactive_file:15776kB unevictable:16kB present:495008kB pages_scanned:31168 all_unreclaimable? no
[  772.962096] lowmem_reserve[]: 0 0 0 0
[  772.965848] Node 0 DMA: 19*4kB 3*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[  772.976695] Node 0 DMA32: 113*4kB 7*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2780kB
[  772.987966] 64559 total pagecache pages
[  772.991800] 7639 pages in swap cache
[  772.995376] Swap cache stats: add 175606, delete 167967, find 14965/57706
[  773.002155] Free swap  = 761820kB
[  773.005474] Total swap = 1048568kB
[  773.012974] 131072 pages RAM
[  773.015871] 9628 pages reserved
[  773.019017] 75524 pages shared
[  773.022066] 57891 pages non-shared
[  773.025474] Out of memory: kill process 3514 (run-many-x-apps) score 892555 or a child
[  773.033387] Killed process 4039 (sol)
[  794.790990] NFS: Server wrote zero bytes, expected 120.
[  822.483490] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  822.490772] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[  822.496918] Call Trace:
[  822.499384]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  822.504871]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  822.510622]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  822.515851]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  822.521593]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  822.527081]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  822.533429]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  822.539434]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  822.545175]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  822.551788]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  822.558481]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  822.563528]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  822.569098]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  822.574327]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  822.580413]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  822.586157]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  822.591727]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  822.596859] Mem-Info:
[  822.599136] Node 0 DMA per-cpu:
[  822.602299] CPU    0: hi:    0, btch:   1 usd:   0
[  822.607084] CPU    1: hi:    0, btch:   1 usd:   0
[  822.611869] Node 0 DMA32 per-cpu:
[  822.615198] CPU    0: hi:  186, btch:  31 usd:  91
[  822.619985] CPU    1: hi:  186, btch:  31 usd:  98
[  822.624773] Active_anon:43566 active_file:835 inactive_anon:46874
[  822.624774]  inactive_file:3327 unevictable:4 dirty:0 writeback:0 unstable:0
[  822.624775]  free:1187 slab:13349 mapped:3843 pagetables:6679 bounce:0
[  822.644402] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5648kB inactive_anon:6260kB active_file:24kB inactive_file:72kB unevictable:0kB present:15164kB pages_scanned:20672 all_unreclaimable? yes
[  822.663507] lowmem_reserve[]: 0 483 483 483
[  822.667773] Node 0 DMA32 free:2748kB min:2768kB low:3460kB high:4152kB active_anon:168616kB inactive_anon:181236kB active_file:3316kB inactive_file:13236kB unevictable:16kB present:495008kB pages_scanned:729026 all_unreclaimable? yes
[  822.688432] lowmem_reserve[]: 0 0 0 0
[  822.692178] Node 0 DMA: 16*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[  822.703015] Node 0 DMA32: 53*4kB 31*8kB 15*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2748kB
[  822.714282] 63870 total pagecache pages
[  822.718120] 7714 pages in swap cache
[  822.721687] Swap cache stats: add 177378, delete 169664, find 15255/58971
[  822.728470] Free swap  = 772080kB
[  822.731787] Total swap = 1048568kB
[  822.738767] 131072 pages RAM
[  822.741648] 9628 pages reserved
[  822.744800] 78480 pages shared
[  822.747857] 58328 pages non-shared
[  822.751262] Out of memory: kill process 3514 (run-many-x-apps) score 874039 or a child
[  822.759173] Killed process 4071 (gnometris)
[  838.434074] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  838.441560] Pid: 5500, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[  838.448286] Call Trace:
[  838.450770]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  838.456279]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  838.462053]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  838.467299]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  838.473064]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  838.478570]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  838.484930]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  838.490953]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  838.496714]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  838.503346]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  838.510056]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  838.515121]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  838.520707]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  838.525955]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  838.532058]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  838.537819]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  838.543405]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  838.548553] Mem-Info:
[  838.550844] Node 0 DMA per-cpu:
[  838.554023] CPU    0: hi:    0, btch:   1 usd:   0
[  838.558818] CPU    1: hi:    0, btch:   1 usd:   0
[  838.563614] Node 0 DMA32 per-cpu:
[  838.566959] CPU    0: hi:  186, btch:  31 usd: 174
[  838.571767] CPU    1: hi:  186, btch:  31 usd:  87
[  838.576579] Active_anon:43520 active_file:718 inactive_anon:46874
[  838.576582]  inactive_file:3607 unevictable:4 dirty:0 writeback:0 unstable:0
[  838.576584]  free:1193 slab:13228 mapped:4138 pagetables:6608 bounce:0
[  838.596232] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5620kB inactive_anon:6260kB active_file:28kB inactive_file:72kB unevictable:0kB present:15164kB pages_scanned:18848 all_unreclaimable? yes
[  838.615367] lowmem_reserve[]: 0 483 483 483
[  838.619678] Node 0 DMA32 free:2764kB min:2768kB low:3460kB high:4152kB active_anon:168460kB inactive_anon:181236kB active_file:2844kB inactive_file:14356kB unevictable:16kB present:495008kB pages_scanned:585548 all_unreclaimable? yes
[  838.640372] lowmem_reserve[]: 0 0 0 0
[  838.644163] Node 0 DMA: 18*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[  838.655125] Node 0 DMA32: 109*4kB 7*8kB 16*16kB 14*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2732kB
[  838.666499] 64009 total pagecache pages
[  838.670350] 7656 pages in swap cache
[  838.673941] Swap cache stats: add 177561, delete 169905, find 15273/59126
[  838.680734] Free swap  = 791892kB
[  838.684060] Total swap = 1048568kB
[  838.694532] 131072 pages RAM
[  838.697436] 9628 pages reserved
[  838.700590] 73594 pages shared
[  838.703661] 58166 pages non-shared
[  838.707076] Out of memory: kill process 3514 (run-many-x-apps) score 853023 or a child
[  838.714995] Killed process 4104 (gnect)
[  889.461532] scim-panel-gtk invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  889.469205] Pid: 3360, comm: scim-panel-gtk Not tainted 2.6.30-rc8-mm1 #301
[  889.476177] Call Trace:
[  889.478662]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  889.484172]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  889.489944]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  889.495191]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  889.500962]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  889.506455]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  889.512814]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  889.518831]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  889.524591]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  889.531220]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  889.537930]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  889.542994]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  889.548580]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  889.553829]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  889.559928]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  889.565694]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  889.571281]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  889.576428] Mem-Info:
[  889.578716] Node 0 DMA per-cpu:
[  889.581897] CPU    0: hi:    0, btch:   1 usd:   0
[  889.586693] CPU    1: hi:    0, btch:   1 usd:   0
[  889.591489] Node 0 DMA32 per-cpu:
[  889.594838] CPU    0: hi:  186, btch:  31 usd:  27
[  889.599639] CPU    1: hi:  186, btch:  31 usd:  52
[  889.604447] Active_anon:43571 active_file:1739 inactive_anon:47198
[  889.604450]  inactive_file:2522 unevictable:4 dirty:0 writeback:0 unstable:0
[  889.604453]  free:1172 slab:13250 mapped:4789 pagetables:6476 bounce:0
[  889.624188] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5672kB inactive_anon:6228kB active_file:0kB inactive_file:28kB unevictable:0kB present:15164kB pages_scanned:18758 all_unreclaimable? yes
[  889.643237] lowmem_reserve[]: 0 483 483 483
[  889.647549] Node 0 DMA32 free:2676kB min:2768kB low:3460kB high:4152kB active_anon:168612kB inactive_anon:182564kB active_file:6956kB inactive_file:10060kB unevictable:16kB present:495008kB pages_scanned:562004 all_unreclaimable? yes
[  889.668244] lowmem_reserve[]: 0 0 0 0
[  889.672043] Node 0 DMA: 19*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  889.683006] Node 0 DMA32: 85*4kB 8*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2676kB
[  889.694298] 63465 total pagecache pages
[  889.698147] 7133 pages in swap cache
[  889.701736] Swap cache stats: add 181169, delete 174036, find 15337/60473
[  889.708527] Free swap  = 795216kB
[  889.711853] Total swap = 1048568kB
[  889.722306] 131072 pages RAM
[  889.725220] 9628 pages reserved
[  889.728368] 73642 pages shared
[  889.731430] 58217 pages non-shared
[  889.734842] Out of memory: kill process 3314 (gnome-session) score 875272 or a child
[  889.742589] Killed process 3345 (ssh-agent)
[  889.753188] urxvt invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  889.760064] Pid: 3364, comm: urxvt Not tainted 2.6.30-rc8-mm1 #301
[  889.766248] Call Trace:
[  889.768709]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  889.774212]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  889.779963]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  889.785202]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  889.790961]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  889.796460]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  889.802839]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  889.808867]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  889.814622]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  889.821253]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  889.827970]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  889.833050]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  889.838635]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  889.843875]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  889.849989]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  889.855753]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  889.861356]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  889.866503] Mem-Info:
[  889.868779] Node 0 DMA per-cpu:
[  889.871969] CPU    0: hi:    0, btch:   1 usd:   0
[  889.876771] CPU    1: hi:    0, btch:   1 usd:   0
[  889.881590] Node 0 DMA32 per-cpu:
[  889.884950] CPU    0: hi:  186, btch:  31 usd:  27
[  889.889752] CPU    1: hi:  186, btch:  31 usd:  83
[  889.894557] Active_anon:43568 active_file:1748 inactive_anon:47202
[  889.894560]  inactive_file:2532 unevictable:4 dirty:0 writeback:0 unstable:0
[  889.894562]  free:1172 slab:13256 mapped:4800 pagetables:6457 bounce:0
[  889.914305] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5672kB inactive_anon:6244kB active_file:16kB inactive_file:36kB unevictable:0kB present:15164kB pages_scanned:18758 all_unreclaimable? yes
[  889.933431] lowmem_reserve[]: 0 483 483 483
[  889.937757] Node 0 DMA32 free:2676kB min:2768kB low:3460kB high:4152kB active_anon:168600kB inactive_anon:182564kB active_file:6976kB inactive_file:10092kB unevictable:16kB present:495008kB pages_scanned:572756 all_unreclaimable? yes
[  889.958441] lowmem_reserve[]: 0 0 0 0
[  889.962251] Node 0 DMA: 19*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  889.973218] Node 0 DMA32: 85*4kB 8*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2676kB
[  889.984510] 63470 total pagecache pages
[  889.988363] 7128 pages in swap cache
[  889.991956] Swap cache stats: add 181169, delete 174041, find 15337/60473
[  889.998764] Free swap  = 795628kB
[  890.002089] Total swap = 1048568kB
[  890.012112] 131072 pages RAM
[  890.015034] 9628 pages reserved
[  890.018197] 73633 pages shared
[  890.021274] 58191 pages non-shared
[  890.024686] Out of memory: kill process 3314 (gnome-session) score 870770 or a child
[  890.032441] Killed process 3363 (firefox-bin)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10  5:03     ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10  5:03 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > [resend with lists cc'd, sorry]
> 
> [and fixed Hugh's email.  crap]
> 
> > Hi,
> > 
> > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > suggestion, I moved the pte collecting to the callsite and thus out
> > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > array of that many swap entries on the stack, but I think it is better
> > to limit the cluster size to a sane maximum than using dynamic
> > allocation for this purpose.

Hi Johannes,

When stress testing your patch, I found it triggered many OOM kills.
Around the time of last OOMs, the memory usage is:

             total       used       free     shared    buffers     cached
Mem:           474        468          5          0          0        239
-/+ buffers/cache:        229        244
Swap:         1023        221        802

Thanks,
Fengguang
---

full kernel log:

[  472.528487] /usr/games/glch invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  472.537228] Pid: 4361, comm: /usr/games/glch Not tainted 2.6.30-rc8-mm1 #301
[  472.544293] Call Trace:
[  472.546762]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  472.552259]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  472.558010]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  472.563250]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  472.568991]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  472.574499]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  472.580858]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  472.586871]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  472.592614]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  472.599222]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  472.605926]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  472.610987]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  472.616558]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  472.621786]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  472.627874]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  472.633658]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  472.639258]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  472.644413] Mem-Info:
[  472.646698] Node 0 DMA per-cpu:
[  472.649855] CPU    0: hi:    0, btch:   1 usd:   0
[  472.654649] CPU    1: hi:    0, btch:   1 usd:   0
[  472.659439] Node 0 DMA32 per-cpu:
[  472.662774] CPU    0: hi:  186, btch:  31 usd: 114
[  472.667560] CPU    1: hi:  186, btch:  31 usd:  81
[  472.672350] Active_anon:43340 active_file:774 inactive_anon:46297
[  472.672351]  inactive_file:2095 unevictable:4 dirty:0 writeback:0 unstable:0
[  472.672352]  free:1334 slab:13888 mapped:3528 pagetables:7580 bounce:0
[  472.692012] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:4892kB inactive_anon:6200kB active_file:12kB inactive_file:172kB unevictable:0kB present:15164kB pages_scanned:6752 all_unreclaimable? no
[  472.711031] lowmem_reserve[]: 0 483 483 483
[  472.715313] Node 0 DMA32 free:3320kB min:2768kB low:3460kB high:4152kB active_anon:168468kB inactive_anon:179064kB active_file:3084kB inactive_file:8208kB unevictable:16kB present:495008kB pages_scanned:265856 all_unreclaimable? no
[  472.735793] lowmem_reserve[]: 0 0 0 0
[  472.739546] Node 0 DMA: 21*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[  472.750386] Node 0 DMA32: 220*4kB 23*8kB 17*16kB 14*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 3320kB
[  472.761754] 63776 total pagecache pages
[  472.765589] 9263 pages in swap cache
[  472.769162] Swap cache stats: add 166054, delete 156791, find 14174/51560
[  472.775943] Free swap  = 689708kB
[  472.779264] Total swap = 1048568kB
[  472.786832] 131072 pages RAM
[  472.789713] 9628 pages reserved
[  472.792861] 86958 pages shared
[  472.795921] 56805 pages non-shared
[  472.799325] Out of memory: kill process 3514 (run-many-x-apps) score 1495085 or a child
[  472.807327] Killed process 3516 (xeyes)
[  473.861300] gnobots2 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  473.868615] Pid: 4533, comm: gnobots2 Not tainted 2.6.30-rc8-mm1 #301
[  473.875196] Call Trace:
[  473.877669]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  473.883155]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  473.888919]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  473.894141]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  473.899881]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  473.905362]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  473.911711]  [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[  473.917276]  [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[  473.923451]  [<ffffffff810e9de5>] swapin_readahead+0xb5/0x170
[  473.929194]  [<ffffffff810dac5d>] do_swap_page+0x3fd/0x500
[  473.934677]  [<ffffffff810e9913>] ? lookup_swap_cache+0x13/0x30
[  473.940585]  [<ffffffff810da8da>] ? do_swap_page+0x7a/0x500
[  473.946152]  [<ffffffff810dc70e>] handle_mm_fault+0x44e/0x500
[  473.951898]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  473.957464]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  473.962601] Mem-Info:
[  473.964870] Node 0 DMA per-cpu:
[  473.968036] CPU    0: hi:    0, btch:   1 usd:   0
[  473.972818] CPU    1: hi:    0, btch:   1 usd:   0
[  473.977601] Node 0 DMA32 per-cpu:
[  473.980930] CPU    0: hi:  186, btch:  31 usd:  78
[  473.985718] CPU    1: hi:  186, btch:  31 usd:  79
[  473.990512] Active_anon:43366 active_file:728 inactive_anon:46639
[  473.990513]  inactive_file:2442 unevictable:4 dirty:0 writeback:0 unstable:0
[  473.990515]  free:1187 slab:13677 mapped:3344 pagetables:7560 bounce:0
[  474.010136] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:4872kB inactive_anon:6360kB active_file:28kB inactive_file:96kB unevictable:0kB present:15164kB pages_scanned:15568 all_unreclaimable? no
[  474.029143] lowmem_reserve[]: 0 483 483 483
[  474.033403] Node 0 DMA32 free:2740kB min:2768kB low:3460kB high:4152kB active_anon:168592kB inactive_anon:180308kB active_file:2884kB inactive_file:9672kB unevictable:16kB present:495008kB pages_scanned:627904 all_unreclaimable? yes
[  474.053974] lowmem_reserve[]: 0 0 0 0
[  474.057721] Node 0 DMA: 16*4kB 3*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[  474.068556] Node 0 DMA32: 105*4kB 6*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2740kB
[  474.079825] 64075 total pagecache pages
[  474.083660] 9277 pages in swap cache
[  474.087235] Swap cache stats: add 166129, delete 156852, find 14175/51619
[  474.094011] Free swap  = 690168kB
[  474.097327] Total swap = 1048568kB
[  474.104333] 131072 pages RAM
[  474.107225] 9628 pages reserved
[  474.110363] 84659 pages shared
[  474.113409] 57530 pages non-shared
[  474.116816] Out of memory: kill process 3514 (run-many-x-apps) score 1490267 or a child
[  474.124811] Killed process 3593 (gthumb)
[  480.443446] gnome-network-p invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  480.451749] Pid: 5242, comm: gnome-network-p Not tainted 2.6.30-rc8-mm1 #301
[  480.458883] Call Trace:
[  480.461362]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  480.467248]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  480.473025]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  480.478294]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  480.484050]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  480.489546]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  480.495920]  [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[  480.501509]  [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[  480.507718]  [<ffffffff810e9de5>] swapin_readahead+0xb5/0x170
[  480.513477]  [<ffffffff810dac5d>] do_swap_page+0x3fd/0x500
[  480.518982]  [<ffffffff810e9913>] ? lookup_swap_cache+0x13/0x30
[  480.524917]  [<ffffffff810da8da>] ? do_swap_page+0x7a/0x500
[  480.530515]  [<ffffffff810dc70e>] handle_mm_fault+0x44e/0x500
[  480.536273]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  480.541865]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  480.547023] Mem-Info:
[  480.549305] Node 0 DMA per-cpu:
[  480.552485] CPU    0: hi:    0, btch:   1 usd:   0
[  480.557293] CPU    1: hi:    0, btch:   1 usd:   0
[  480.562106] Node 0 DMA32 per-cpu:
[  480.565450] CPU    0: hi:  186, btch:  31 usd: 166
[  480.570260] CPU    1: hi:  186, btch:  31 usd:  54
[  480.575072] Active_anon:43200 active_file:1328 inactive_anon:46633
[  480.575077]  inactive_file:2266 unevictable:4 dirty:0 writeback:0 unstable:0
[  480.575081]  free:1175 slab:13522 mapped:4094 pagetables:7430 bounce:0
[  480.594826] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5048kB inactive_anon:6228kB active_file:24kB inactive_file:92kB unevictable:0kB present:15164kB pages_scanned:20576 all_unreclaimable? yes
[  480.613968] lowmem_reserve[]: 0 483 483 483
[  480.618302] Node 0 DMA32 free:2696kB min:2768kB low:3460kB high:4152kB active_anon:167804kB inactive_anon:180304kB active_file:5324kB inactive_file:9012kB unevictable:16kB present:495008kB pages_scanned:698592 all_unreclaimable? yes
[  480.638902] lowmem_reserve[]: 0 0 0 0
[  480.642709] Node 0 DMA: 15*4kB 1*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1988kB
[  480.653661] Node 0 DMA32: 100*4kB 5*8kB 15*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2696kB
[  480.665062] 64296 total pagecache pages
[  480.668909] 9027 pages in swap cache
[  480.672486] Swap cache stats: add 166520, delete 157493, find 14190/51963
[  480.679265] Free swap  = 697604kB
[  480.682590] Total swap = 1048568kB
[  480.692920] 131072 pages RAM
[  480.695835] 9628 pages reserved
[  480.698989] 83496 pages shared
[  480.702055] 56997 pages non-shared
[  480.705460] Out of memory: kill process 3514 (run-many-x-apps) score 1233725 or a child
[  480.713480] Killed process 3620 (gedit)
[  485.239788] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  485.247180] Pid: 3407, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[  485.253879] Call Trace:
[  485.256340]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  485.261825]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  485.267587]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  485.272810]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  485.278556]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  485.284034]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  485.290383]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  485.296384]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  485.302127]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  485.308729]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  485.315421]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  485.320471]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  485.326044]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  485.331264]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  485.337348]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  485.343091]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  485.348660]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  485.353794] Mem-Info:
[  485.356074] Node 0 DMA per-cpu:
[  485.359238] CPU    0: hi:    0, btch:   1 usd:   0
[  485.364022] CPU    1: hi:    0, btch:   1 usd:   0
[  485.368805] Node 0 DMA32 per-cpu:
[  485.372130] CPU    0: hi:  186, btch:  31 usd:  86
[  485.376917] CPU    1: hi:  186, btch:  31 usd:  65
[  485.381704] Active_anon:43069 active_file:1343 inactive_anon:46566
[  485.381705]  inactive_file:2264 unevictable:4 dirty:0 writeback:0 unstable:0
[  485.381706]  free:1177 slab:13765 mapped:3976 pagetables:7336 bounce:0
[  485.401416] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5096kB inactive_anon:6228kB active_file:24kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:14624 all_unreclaimable? no
[  485.420352] lowmem_reserve[]: 0 483 483 483
[  485.424627] Node 0 DMA32 free:2708kB min:2768kB low:3460kB high:4152kB active_anon:167180kB inactive_anon:180036kB active_file:5348kB inactive_file:9072kB unevictable:16kB present:495008kB pages_scanned:700592 all_unreclaimable? yes
[  485.445209] lowmem_reserve[]: 0 0 0 0
[  485.448983] Node 0 DMA: 25*4kB 1*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  485.459812] Node 0 DMA32: 97*4kB 8*8kB 15*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2708kB
[  485.470995] 64132 total pagecache pages
[  485.474826] 8910 pages in swap cache
[  485.478397] Swap cache stats: add 166970, delete 158060, find 14213/52337
[  485.485171] Free swap  = 704464kB
[  485.488481] Total swap = 1048568kB
[  485.495505] 131072 pages RAM
[  485.498400] 9628 pages reserved
[  485.501539] 80730 pages shared
[  485.504593] 57330 pages non-shared
[  485.507994] Out of memory: kill process 3514 (run-many-x-apps) score 1208843 or a child
[  485.515986] Killed process 3653 (xpdf.bin)
[  487.520227] blackjack invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  487.527723] Pid: 4579, comm: blackjack Not tainted 2.6.30-rc8-mm1 #301
[  487.534650] Call Trace:
[  487.537290]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  487.542782]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  487.548533]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  487.553767]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  487.559522]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  487.565003]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  487.571353]  [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[  487.576933]  [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[  487.583117]  [<ffffffff810e9e19>] swapin_readahead+0xe9/0x170
[  487.588860]  [<ffffffff810d1167>] shmem_getpage+0x607/0x970
[  487.594432]  [<ffffffff810a9c8b>] ? delayacct_end+0x6b/0xa0
[  487.600003]  [<ffffffff810a9caa>] ? delayacct_end+0x8a/0xa0
[  487.605571]  [<ffffffff810a9d2f>] ? __delayacct_blkio_end+0x2f/0x50
[  487.611837]  [<ffffffff81542132>] ? io_schedule+0x82/0xb0
[  487.617229]  [<ffffffff8107ca35>] ? print_lock_contention_bug+0x25/0x120
[  487.623927]  [<ffffffff810c0970>] ? sync_page+0x0/0x80
[  487.629060]  [<ffffffff810c0700>] ? find_get_page+0x0/0x110
[  487.634633]  [<ffffffff81052702>] ? current_fs_time+0x22/0x30
[  487.640372]  [<ffffffff810d9983>] ? __do_fault+0x153/0x510
[  487.645849]  [<ffffffff8107ca35>] ? print_lock_contention_bug+0x25/0x120
[  487.652542]  [<ffffffff810d151a>] shmem_fault+0x4a/0x80
[  487.657762]  [<ffffffff812444a9>] shm_fault+0x19/0x20
[  487.662819]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  487.668036]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  487.674125]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  487.679867]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  487.685434]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  487.690570] Mem-Info:
[  487.692836] Node 0 DMA per-cpu:
[  487.696003] CPU    0: hi:    0, btch:   1 usd:   0
[  487.700790] CPU    1: hi:    0, btch:   1 usd:   0
[  487.705578] Node 0 DMA32 per-cpu:
[  487.708906] CPU    0: hi:  186, btch:  31 usd: 142
[  487.713698] CPU    1: hi:  186, btch:  31 usd:  77
[  487.718498] Active_anon:42533 active_file:677 inactive_anon:46561
[  487.718499]  inactive_file:3214 unevictable:4 dirty:0 writeback:0 unstable:0
[  487.718500]  free:1573 slab:13680 mapped:3351 pagetables:7308 bounce:0
[  487.738125] Node 0 DMA free:2064kB min:84kB low:104kB high:124kB active_anon:5152kB inactive_anon:6328kB active_file:8kB inactive_file:92kB unevictable:0kB present:15164kB pages_scanned:1586 all_unreclaimable? no
[  487.756958] lowmem_reserve[]: 0 483 483 483
[  487.761221] Node 0 DMA32 free:4228kB min:2768kB low:3460kB high:4152kB active_anon:164980kB inactive_anon:180068kB active_file:2700kB inactive_file:12764kB unevictable:16kB present:495008kB pages_scanned:42720 all_unreclaimable? no
[  487.781711] lowmem_reserve[]: 0 0 0 0
[  487.785458] Node 0 DMA: 37*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2068kB
[  487.796294] Node 0 DMA32: 271*4kB 105*8kB 16*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 4228kB
[  487.807722] 64270 total pagecache pages
[  487.811557] 8728 pages in swap cache
[  487.815132] Swap cache stats: add 167087, delete 158359, find 14218/52435
[  487.821908] Free swap  = 711028kB
[  487.825220] Total swap = 1048568kB
[  487.832277] 131072 pages RAM
[  487.835178] 9628 pages reserved
[  487.838317] 76338 pages shared
[  487.841364] 57425 pages non-shared
[  487.844768] Out of memory: kill process 3514 (run-many-x-apps) score 1201219 or a child
[  487.852761] Killed process 3696 (xterm)
[  487.857092] tty_ldisc_deref: no references.
[  489.747066] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  489.754480] Pid: 5404, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[  489.761179] Call Trace:
[  489.763640]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  489.769123]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  489.774870]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  489.780090]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  489.785830]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  489.791315]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  489.797665]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  489.803672]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  489.809409]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  489.816020]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  489.822723]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  489.827771]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  489.833338]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  489.838565]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  489.844653]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  489.850404]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  489.855970]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  489.861101] Mem-Info:
[  489.863375] Node 0 DMA per-cpu:
[  489.866538] CPU    0: hi:    0, btch:   1 usd:   0
[  489.871327] CPU    1: hi:    0, btch:   1 usd:   0
[  489.876114] Node 0 DMA32 per-cpu:
[  489.879450] CPU    0: hi:  186, btch:  31 usd: 139
[  489.884235] CPU    1: hi:  186, btch:  31 usd: 168
[  489.889020] Active_anon:42548 active_file:713 inactive_anon:46654
[  489.889022]  inactive_file:3551 unevictable:4 dirty:0 writeback:0 unstable:0
[  489.889023]  free:1191 slab:13619 mapped:3463 pagetables:7277 bounce:0
[  489.908648] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5156kB inactive_anon:6324kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:18048 all_unreclaimable? yes
[  489.927583] lowmem_reserve[]: 0 483 483 483
[  489.931852] Node 0 DMA32 free:2764kB min:2768kB low:3460kB high:4152kB active_anon:165036kB inactive_anon:180292kB active_file:2852kB inactive_file:14204kB unevictable:16kB present:495008kB pages_scanned:598624 all_unreclaimable? yes
[  489.952505] lowmem_reserve[]: 0 0 0 0
[  489.956255] Node 0 DMA: 24*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  489.967104] Node 0 DMA32: 67*4kB 16*8kB 20*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2764kB
[  489.978371] 64571 total pagecache pages
[  489.982209] 8716 pages in swap cache
[  489.985779] Swap cache stats: add 167160, delete 158444, find 14228/52496
[  489.992561] Free swap  = 712436kB
[  489.995878] Total swap = 1048568kB
[  490.003023] 131072 pages RAM
[  490.005917] 9628 pages reserved
[  490.009051] 77164 pages shared
[  490.012111] 57863 pages non-shared
[  490.015516] Out of memory: kill process 3514 (run-many-x-apps) score 1193943 or a child
[  490.023514] Killed process 3789 (gnome-terminal)
[  490.042359] gnome-terminal invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  490.050059] Pid: 3817, comm: gnome-terminal Not tainted 2.6.30-rc8-mm1 #301
[  490.057019] Call Trace:
[  490.059490]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  490.064986]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  490.070743]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  490.075981]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  490.081738]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  490.087245]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  490.093606]  [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[  490.099200]  [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[  490.105390]  [<ffffffff810e9de5>] swapin_readahead+0xb5/0x170
[  490.111157]  [<ffffffff810dac5d>] do_swap_page+0x3fd/0x500
[  490.116651]  [<ffffffff810e9913>] ? lookup_swap_cache+0x13/0x30
[  490.122581]  [<ffffffff810da8da>] ? do_swap_page+0x7a/0x500
[  490.128166]  [<ffffffff810dc70e>] handle_mm_fault+0x44e/0x500
[  490.133932]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  490.139510]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  490.144658]  [<ffffffff8127600c>] ? __get_user_8+0x1c/0x23
[  490.150157]  [<ffffffff810806ad>] ? exit_robust_list+0x5d/0x160
[  490.156088]  [<ffffffff81077c4d>] ? trace_hardirqs_off+0xd/0x10
[  490.162026]  [<ffffffff81544f97>] ? _spin_unlock_irqrestore+0x67/0x70
[  490.168473]  [<ffffffff8104ae5d>] mm_release+0xed/0x100
[  490.173707]  [<ffffffff8104f653>] exit_mm+0x23/0x150
[  490.178684]  [<ffffffff81544f1b>] ? _spin_unlock_irq+0x2b/0x40
[  490.184528]  [<ffffffff81051208>] do_exit+0x138/0x880
[  490.189593]  [<ffffffff8105e757>] ? get_signal_to_deliver+0x67/0x430
[  490.195967]  [<ffffffff81051998>] do_group_exit+0x48/0xd0
[  490.201373]  [<ffffffff8105e9d4>] get_signal_to_deliver+0x2e4/0x430
[  490.207653]  [<ffffffff8100b332>] do_notify_resume+0xc2/0x820
[  490.213410]  [<ffffffff81012859>] ? sched_clock+0x9/0x10
[  490.218743]  [<ffffffff81077c85>] ? lock_release_holdtime+0x35/0x1c0
[  490.225102]  [<ffffffff810fd768>] ? vfs_read+0xc8/0x1a0
[  490.230340]  [<ffffffff8100c057>] sysret_signal+0x83/0xd9
[  490.235750] Mem-Info:
[  490.238041] Node 0 DMA per-cpu:
[  490.241213] CPU    0: hi:    0, btch:   1 usd:   0
[  490.246023] CPU    1: hi:    0, btch:   1 usd:   0
[  490.250817] Node 0 DMA32 per-cpu:
[  490.254173] CPU    0: hi:  186, btch:  31 usd: 139
[  490.258976] CPU    1: hi:  186, btch:  31 usd: 169
[  490.263781] Active_anon:42548 active_file:713 inactive_anon:46660
[  490.263784]  inactive_file:3551 unevictable:4 dirty:0 writeback:0 unstable:0
[  490.263787]  free:1191 slab:13619 mapped:3463 pagetables:7277 bounce:0
[  490.283433] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5156kB inactive_anon:6324kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:18048 all_unreclaimable? yes
[  490.302379] lowmem_reserve[]: 0 483 483 483
[  490.306699] Node 0 DMA32 free:2764kB min:2768kB low:3460kB high:4152kB active_anon:165036kB inactive_anon:180316kB active_file:2852kB inactive_file:14204kB unevictable:16kB present:495008kB pages_scanned:616288 all_unreclaimable? yes
[  490.327380] lowmem_reserve[]: 0 0 0 0
[  490.331178] Node 0 DMA: 24*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  490.342134] Node 0 DMA32: 67*4kB 16*8kB 20*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2764kB
[  490.353506] 64571 total pagecache pages
[  490.357357] 8716 pages in swap cache
[  490.360943] Swap cache stats: add 167160, delete 158444, find 14228/52497
[  490.367735] Free swap  = 712436kB
[  490.371063] Total swap = 1048568kB
[  490.381335] 131072 pages RAM
[  490.384247] 9628 pages reserved
[  490.387398] 77163 pages shared
[  490.390461] 57864 pages non-shared
[  491.721918] tty_ldisc_deref: no references.
[  507.974133] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  507.981095] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[  507.987465] Call Trace:
[  507.990171]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  507.995670]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  508.001413]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  508.006640]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  508.012378]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  508.017857]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  508.024207]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  508.030211]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  508.035951]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  508.042555]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  508.049248]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  508.054298]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  508.059864]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  508.065082]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  508.071170]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  508.076916]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  508.082488]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  508.087617] Mem-Info:
[  508.089890] Node 0 DMA per-cpu:
[  508.093045] CPU    0: hi:    0, btch:   1 usd:   0
[  508.097831] CPU    1: hi:    0, btch:   1 usd:   0
[  508.102618] Node 0 DMA32 per-cpu:
[  508.105949] CPU    0: hi:  186, btch:  31 usd:  70
[  508.110732] CPU    1: hi:  186, btch:  31 usd:  35
[  508.115518] Active_anon:43375 active_file:1606 inactive_anon:46595
[  508.115519]  inactive_file:2431 unevictable:4 dirty:0 writeback:0 unstable:0
[  508.115520]  free:1171 slab:13500 mapped:4464 pagetables:7137 bounce:0
[  508.135223] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5372kB inactive_anon:6304kB active_file:48kB inactive_file:152kB unevictable:0kB present:15164kB pages_scanned:18016 all_unreclaimable? yes
[  508.154402] lowmem_reserve[]: 0 483 483 483
[  508.158670] Node 0 DMA32 free:2684kB min:2768kB low:3460kB high:4152kB active_anon:168128kB inactive_anon:180076kB active_file:6376kB inactive_file:9572kB unevictable:16kB present:495008kB pages_scanned:574528 all_unreclaimable? yes
[  508.179230] lowmem_reserve[]: 0 0 0 0
[  508.182977] Node 0 DMA: 20*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[  508.193806] Node 0 DMA32: 81*4kB 9*8kB 17*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2684kB
[  508.204972] 64466 total pagecache pages
[  508.208804] 8648 pages in swap cache
[  508.212374] Swap cache stats: add 169110, delete 160462, find 14531/53889
[  508.219151] Free swap  = 723636kB
[  508.222465] Total swap = 1048568kB
[  508.229465] 131072 pages RAM
[  508.232364] 9628 pages reserved
[  508.235504] 80834 pages shared
[  508.238558] 57150 pages non-shared
[  508.241961] Out of memory: kill process 3514 (run-many-x-apps) score 1142844 or a child
[  508.249954] Killed process 3828 (urxvt)
[  508.254826] tty_ldisc_deref: no references.
[  518.644007] /usr/games/gnom invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  518.652048] Pid: 4284, comm: /usr/games/gnom Not tainted 2.6.30-rc8-mm1 #301
[  518.659110] Call Trace:
[  518.661572]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  518.667060]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  518.672805]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  518.678036]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  518.683779]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  518.689265]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  518.695629]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  518.701648]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  518.707396]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  518.714015]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  518.720728]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  518.725782]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  518.731376]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  518.736610]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  518.742724]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  518.748470]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  518.754050]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  518.759186] Mem-Info:
[  518.761457] Node 0 DMA per-cpu:
[  518.764622] CPU    0: hi:    0, btch:   1 usd:   0
[  518.769433] CPU    1: hi:    0, btch:   1 usd:   0
[  518.774250] Node 0 DMA32 per-cpu:
[  518.777607] CPU    0: hi:  186, btch:  31 usd: 122
[  518.782429] CPU    1: hi:  186, btch:  31 usd: 140
[  518.787320] Active_anon:43558 active_file:800 inactive_anon:46596
[  518.787322]  inactive_file:3200 unevictable:4 dirty:0 writeback:1 unstable:0
[  518.787324]  free:1170 slab:13276 mapped:3632 pagetables:7067 bounce:0
[  518.806969] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5392kB inactive_anon:6284kB active_file:8kB inactive_file:192kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  518.825631] lowmem_reserve[]: 0 483 483 483
[  518.829894] Node 0 DMA32 free:2676kB min:2768kB low:3460kB high:4152kB active_anon:168840kB inactive_anon:180100kB active_file:3192kB inactive_file:12608kB unevictable:16kB present:495008kB pages_scanned:2752 all_unreclaimable? no
[  518.850287] lowmem_reserve[]: 0 0 0 0
[  518.854034] Node 0 DMA: 17*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2004kB
[  518.864860] Node 0 DMA32: 51*4kB 9*8kB 22*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2676kB
[  518.876047] 64523 total pagecache pages
[  518.879879] 8754 pages in swap cache
[  518.883453] Swap cache stats: add 169415, delete 160661, find 14593/54101
[  518.890231] Free swap  = 727320kB
[  518.893549] Total swap = 1048568kB
[  518.900474] 131072 pages RAM
[  518.903375] 9628 pages reserved
[  518.906522] 75910 pages shared
[  518.909579] 57545 pages non-shared
[  518.912975] Out of memory: kill process 3514 (run-many-x-apps) score 1125494 or a child
[  518.920971] Killed process 3913 (gnome-system-mo)
[  664.508168] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  664.514995] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[  664.521111] Call Trace:
[  664.523568]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  664.529049]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  664.534794]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  664.540021]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  664.545757]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  664.551235]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  664.557591]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  664.563593]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  664.569336]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  664.575947]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  664.582648]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  664.587710]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  664.593282]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  664.598508]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  664.604603]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  664.610357]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  664.615937]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  664.621071] Mem-Info:
[  664.623341] Node 0 DMA per-cpu:
[  664.626517] CPU    0: hi:    0, btch:   1 usd:   0
[  664.631305] CPU    1: hi:    0, btch:   1 usd:   0
[  664.636096] Node 0 DMA32 per-cpu:
[  664.639430] CPU    0: hi:  186, btch:  31 usd: 108
[  664.644229] CPU    1: hi:  186, btch:  31 usd: 104
[  664.649022] Active_anon:42958 active_file:868 inactive_anon:46862
[  664.649024]  inactive_file:3541 unevictable:4 dirty:0 writeback:0 unstable:0
[  664.649026]  free:1182 slab:13288 mapped:3904 pagetables:7002 bounce:0
[  664.668657] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5528kB inactive_anon:6256kB active_file:0kB inactive_file:56kB unevictable:0kB present:15164kB pages_scanned:17829 all_unreclaimable? yes
[  664.687670] lowmem_reserve[]: 0 483 483 483
[  664.691974] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:166304kB inactive_anon:181192kB active_file:3472kB inactive_file:14108kB unevictable:16kB present:495008kB pages_scanned:561984 all_unreclaimable? yes
[  664.712637] lowmem_reserve[]: 0 0 0 0
[  664.716412] Node 0 DMA: 21*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[  664.727297] Node 0 DMA32: 83*4kB 9*8kB 17*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2724kB
[  664.738494] 64381 total pagecache pages
[  664.742329] 7902 pages in swap cache
[  664.745909] Swap cache stats: add 174458, delete 166556, find 14826/56928
[  664.752696] Free swap  = 734732kB
[  664.756012] Total swap = 1048568kB
[  664.763953] 131072 pages RAM
[  664.766845] 9628 pages reserved
[  664.769992] 74903 pages shared
[  664.773047] 58244 pages non-shared
[  664.776465] Out of memory: kill process 3514 (run-many-x-apps) score 1094818 or a child
[  664.784464] Killed process 3941 (gnome-help)
[  700.167781] Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[  700.174355] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[  700.180473] Call Trace:
[  700.182949]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  700.188480]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  700.194247]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  700.199501]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  700.205257]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  700.210748]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  700.217115]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  700.223132]  [<ffffffff810c73f9>] __get_free_pages+0x9/0x50
[  700.228731]  [<ffffffff8110e3c2>] __pollwait+0xc2/0x100
[  700.233966]  [<ffffffff814958c3>] unix_poll+0x23/0xc0
[  700.239025]  [<ffffffff81419a88>] sock_poll+0x18/0x20
[  700.244095]  [<ffffffff8110d969>] do_select+0x3e9/0x730
[  700.249333]  [<ffffffff8110d580>] ? do_select+0x0/0x730
[  700.254575]  [<ffffffff8110e300>] ? __pollwait+0x0/0x100
[  700.259909]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.264976]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.270034]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.275093]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.280157]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.285223]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.290287]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.295360]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.300416]  [<ffffffff8110e400>] ? pollwake+0x0/0x60
[  700.305475]  [<ffffffff8110deaf>] core_sys_select+0x1ff/0x330
[  700.311225]  [<ffffffff8110dcf8>] ? core_sys_select+0x48/0x330
[  700.317068]  [<ffffffffa014954c>] ? i915_gem_throttle_ioctl+0x4c/0x60 [i915]
[  700.324109]  [<ffffffff810fcf9a>] ? do_readv_writev+0x16a/0x1f0
[  700.330037]  [<ffffffff810706bc>] ? getnstimeofday+0x5c/0xf0
[  700.335708]  [<ffffffff8106aca9>] ? ktime_get_ts+0x59/0x60
[  700.341207]  [<ffffffff8110e23a>] sys_select+0x4a/0x110
[  700.346450]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  700.352471] Mem-Info:
[  700.354744] Node 0 DMA per-cpu:
[  700.357931] CPU    0: hi:    0, btch:   1 usd:   0
[  700.362728] CPU    1: hi:    0, btch:   1 usd:   0
[  700.367528] Node 0 DMA32 per-cpu:
[  700.370869] CPU    0: hi:  186, btch:  31 usd: 124
[  700.375681] CPU    1: hi:  186, btch:  31 usd: 109
[  700.380485] Active_anon:42750 active_file:1211 inactive_anon:46836
[  700.380487]  inactive_file:3834 unevictable:4 dirty:0 writeback:0 unstable:0
[  700.380490]  free:1185 slab:13047 mapped:4269 pagetables:6879 bounce:0
[  700.400224] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5504kB inactive_anon:6244kB active_file:4kB inactive_file:20kB unevictable:0kB present:15164kB pages_scanned:21160 all_unreclaimable? no
[  700.419171] lowmem_reserve[]: 0 483 483 483
[  700.423495] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:165496kB inactive_anon:181100kB active_file:4840kB inactive_file:15316kB unevictable:16kB present:495008kB pages_scanned:749440 all_unreclaimable? yes
[  700.444177] lowmem_reserve[]: 0 0 0 0
[  700.447982] Node 0 DMA: 24*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  700.458919] Node 0 DMA32: 95*4kB 7*8kB 15*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2724kB
[  700.470109] 64769 total pagecache pages
[  700.473944] 7685 pages in swap cache
[  700.477521] Swap cache stats: add 174858, delete 167173, find 14884/57219
[  700.484305] Free swap  = 756796kB
[  700.487619] Total swap = 1048568kB
[  700.495533] 131072 pages RAM
[  700.498435] 9628 pages reserved
[  700.501585] 75677 pages shared
[  700.504647] 57992 pages non-shared
[  700.508062] Out of memory: kill process 3514 (run-many-x-apps) score 920259 or a child
[  700.515981] Killed process 3972 (gnome-dictionar)
[  772.754850] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  772.762316] Pid: 3363, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[  772.769042] Call Trace:
[  772.771532]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  772.777056]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  772.782830]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  772.788093]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  772.793861]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  772.799371]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  772.805903]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  772.812044]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  772.817979]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  772.824833]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  772.831934]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  772.837201]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  772.843077]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  772.848298]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  772.854384]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  772.860126]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  772.865693]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  772.870831] Mem-Info:
[  772.873099] Node 0 DMA per-cpu:
[  772.876268] CPU    0: hi:    0, btch:   1 usd:   0
[  772.881052] CPU    1: hi:    0, btch:   1 usd:   0
[  772.885837] Node 0 DMA32 per-cpu:
[  772.889177] CPU    0: hi:  186, btch:  31 usd: 119
[  772.893970] CPU    1: hi:  186, btch:  31 usd: 131
[  772.898771] Active_anon:42925 active_file:967 inactive_anon:46822
[  772.898773]  inactive_file:3951 unevictable:4 dirty:0 writeback:0 unstable:0
[  772.898775]  free:1195 slab:13130 mapped:4261 pagetables:6775 bounce:0
[  772.918425] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5572kB inactive_anon:6228kB active_file:0kB inactive_file:28kB unevictable:0kB present:15164kB pages_scanned:1152 all_unreclaimable? no
[  772.937282] lowmem_reserve[]: 0 483 483 483
[  772.941583] Node 0 DMA32 free:2780kB min:2768kB low:3460kB high:4152kB active_anon:166128kB inactive_anon:181060kB active_file:3868kB inactive_file:15776kB unevictable:16kB present:495008kB pages_scanned:31168 all_unreclaimable? no
[  772.962096] lowmem_reserve[]: 0 0 0 0
[  772.965848] Node 0 DMA: 19*4kB 3*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[  772.976695] Node 0 DMA32: 113*4kB 7*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2780kB
[  772.987966] 64559 total pagecache pages
[  772.991800] 7639 pages in swap cache
[  772.995376] Swap cache stats: add 175606, delete 167967, find 14965/57706
[  773.002155] Free swap  = 761820kB
[  773.005474] Total swap = 1048568kB
[  773.012974] 131072 pages RAM
[  773.015871] 9628 pages reserved
[  773.019017] 75524 pages shared
[  773.022066] 57891 pages non-shared
[  773.025474] Out of memory: kill process 3514 (run-many-x-apps) score 892555 or a child
[  773.033387] Killed process 4039 (sol)
[  794.790990] NFS: Server wrote zero bytes, expected 120.
[  822.483490] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  822.490772] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[  822.496918] Call Trace:
[  822.499384]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  822.504871]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  822.510622]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  822.515851]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  822.521593]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  822.527081]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  822.533429]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  822.539434]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  822.545175]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  822.551788]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  822.558481]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  822.563528]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  822.569098]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  822.574327]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  822.580413]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  822.586157]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  822.591727]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  822.596859] Mem-Info:
[  822.599136] Node 0 DMA per-cpu:
[  822.602299] CPU    0: hi:    0, btch:   1 usd:   0
[  822.607084] CPU    1: hi:    0, btch:   1 usd:   0
[  822.611869] Node 0 DMA32 per-cpu:
[  822.615198] CPU    0: hi:  186, btch:  31 usd:  91
[  822.619985] CPU    1: hi:  186, btch:  31 usd:  98
[  822.624773] Active_anon:43566 active_file:835 inactive_anon:46874
[  822.624774]  inactive_file:3327 unevictable:4 dirty:0 writeback:0 unstable:0
[  822.624775]  free:1187 slab:13349 mapped:3843 pagetables:6679 bounce:0
[  822.644402] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5648kB inactive_anon:6260kB active_file:24kB inactive_file:72kB unevictable:0kB present:15164kB pages_scanned:20672 all_unreclaimable? yes
[  822.663507] lowmem_reserve[]: 0 483 483 483
[  822.667773] Node 0 DMA32 free:2748kB min:2768kB low:3460kB high:4152kB active_anon:168616kB inactive_anon:181236kB active_file:3316kB inactive_file:13236kB unevictable:16kB present:495008kB pages_scanned:729026 all_unreclaimable? yes
[  822.688432] lowmem_reserve[]: 0 0 0 0
[  822.692178] Node 0 DMA: 16*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[  822.703015] Node 0 DMA32: 53*4kB 31*8kB 15*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2748kB
[  822.714282] 63870 total pagecache pages
[  822.718120] 7714 pages in swap cache
[  822.721687] Swap cache stats: add 177378, delete 169664, find 15255/58971
[  822.728470] Free swap  = 772080kB
[  822.731787] Total swap = 1048568kB
[  822.738767] 131072 pages RAM
[  822.741648] 9628 pages reserved
[  822.744800] 78480 pages shared
[  822.747857] 58328 pages non-shared
[  822.751262] Out of memory: kill process 3514 (run-many-x-apps) score 874039 or a child
[  822.759173] Killed process 4071 (gnometris)
[  838.434074] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  838.441560] Pid: 5500, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[  838.448286] Call Trace:
[  838.450770]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  838.456279]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  838.462053]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  838.467299]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  838.473064]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  838.478570]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  838.484930]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  838.490953]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  838.496714]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  838.503346]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  838.510056]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  838.515121]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  838.520707]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  838.525955]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  838.532058]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  838.537819]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  838.543405]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  838.548553] Mem-Info:
[  838.550844] Node 0 DMA per-cpu:
[  838.554023] CPU    0: hi:    0, btch:   1 usd:   0
[  838.558818] CPU    1: hi:    0, btch:   1 usd:   0
[  838.563614] Node 0 DMA32 per-cpu:
[  838.566959] CPU    0: hi:  186, btch:  31 usd: 174
[  838.571767] CPU    1: hi:  186, btch:  31 usd:  87
[  838.576579] Active_anon:43520 active_file:718 inactive_anon:46874
[  838.576582]  inactive_file:3607 unevictable:4 dirty:0 writeback:0 unstable:0
[  838.576584]  free:1193 slab:13228 mapped:4138 pagetables:6608 bounce:0
[  838.596232] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5620kB inactive_anon:6260kB active_file:28kB inactive_file:72kB unevictable:0kB present:15164kB pages_scanned:18848 all_unreclaimable? yes
[  838.615367] lowmem_reserve[]: 0 483 483 483
[  838.619678] Node 0 DMA32 free:2764kB min:2768kB low:3460kB high:4152kB active_anon:168460kB inactive_anon:181236kB active_file:2844kB inactive_file:14356kB unevictable:16kB present:495008kB pages_scanned:585548 all_unreclaimable? yes
[  838.640372] lowmem_reserve[]: 0 0 0 0
[  838.644163] Node 0 DMA: 18*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[  838.655125] Node 0 DMA32: 109*4kB 7*8kB 16*16kB 14*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2732kB
[  838.666499] 64009 total pagecache pages
[  838.670350] 7656 pages in swap cache
[  838.673941] Swap cache stats: add 177561, delete 169905, find 15273/59126
[  838.680734] Free swap  = 791892kB
[  838.684060] Total swap = 1048568kB
[  838.694532] 131072 pages RAM
[  838.697436] 9628 pages reserved
[  838.700590] 73594 pages shared
[  838.703661] 58166 pages non-shared
[  838.707076] Out of memory: kill process 3514 (run-many-x-apps) score 853023 or a child
[  838.714995] Killed process 4104 (gnect)
[  889.461532] scim-panel-gtk invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  889.469205] Pid: 3360, comm: scim-panel-gtk Not tainted 2.6.30-rc8-mm1 #301
[  889.476177] Call Trace:
[  889.478662]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  889.484172]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  889.489944]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  889.495191]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  889.500962]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  889.506455]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  889.512814]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  889.518831]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  889.524591]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  889.531220]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  889.537930]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  889.542994]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  889.548580]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  889.553829]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  889.559928]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  889.565694]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  889.571281]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  889.576428] Mem-Info:
[  889.578716] Node 0 DMA per-cpu:
[  889.581897] CPU    0: hi:    0, btch:   1 usd:   0
[  889.586693] CPU    1: hi:    0, btch:   1 usd:   0
[  889.591489] Node 0 DMA32 per-cpu:
[  889.594838] CPU    0: hi:  186, btch:  31 usd:  27
[  889.599639] CPU    1: hi:  186, btch:  31 usd:  52
[  889.604447] Active_anon:43571 active_file:1739 inactive_anon:47198
[  889.604450]  inactive_file:2522 unevictable:4 dirty:0 writeback:0 unstable:0
[  889.604453]  free:1172 slab:13250 mapped:4789 pagetables:6476 bounce:0
[  889.624188] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5672kB inactive_anon:6228kB active_file:0kB inactive_file:28kB unevictable:0kB present:15164kB pages_scanned:18758 all_unreclaimable? yes
[  889.643237] lowmem_reserve[]: 0 483 483 483
[  889.647549] Node 0 DMA32 free:2676kB min:2768kB low:3460kB high:4152kB active_anon:168612kB inactive_anon:182564kB active_file:6956kB inactive_file:10060kB unevictable:16kB present:495008kB pages_scanned:562004 all_unreclaimable? yes
[  889.668244] lowmem_reserve[]: 0 0 0 0
[  889.672043] Node 0 DMA: 19*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  889.683006] Node 0 DMA32: 85*4kB 8*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2676kB
[  889.694298] 63465 total pagecache pages
[  889.698147] 7133 pages in swap cache
[  889.701736] Swap cache stats: add 181169, delete 174036, find 15337/60473
[  889.708527] Free swap  = 795216kB
[  889.711853] Total swap = 1048568kB
[  889.722306] 131072 pages RAM
[  889.725220] 9628 pages reserved
[  889.728368] 73642 pages shared
[  889.731430] 58217 pages non-shared
[  889.734842] Out of memory: kill process 3314 (gnome-session) score 875272 or a child
[  889.742589] Killed process 3345 (ssh-agent)
[  889.753188] urxvt invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  889.760064] Pid: 3364, comm: urxvt Not tainted 2.6.30-rc8-mm1 #301
[  889.766248] Call Trace:
[  889.768709]  [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[  889.774212]  [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[  889.779963]  [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[  889.785202]  [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[  889.790961]  [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[  889.796460]  [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  889.802839]  [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[  889.808867]  [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[  889.814622]  [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[  889.821253]  [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[  889.827970]  [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[  889.833050]  [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[  889.838635]  [<ffffffff810d9883>] __do_fault+0x53/0x510
[  889.843875]  [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[  889.849989]  [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[  889.855753]  [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[  889.861356]  [<ffffffff81545a55>] page_fault+0x25/0x30
[  889.866503] Mem-Info:
[  889.868779] Node 0 DMA per-cpu:
[  889.871969] CPU    0: hi:    0, btch:   1 usd:   0
[  889.876771] CPU    1: hi:    0, btch:   1 usd:   0
[  889.881590] Node 0 DMA32 per-cpu:
[  889.884950] CPU    0: hi:  186, btch:  31 usd:  27
[  889.889752] CPU    1: hi:  186, btch:  31 usd:  83
[  889.894557] Active_anon:43568 active_file:1748 inactive_anon:47202
[  889.894560]  inactive_file:2532 unevictable:4 dirty:0 writeback:0 unstable:0
[  889.894562]  free:1172 slab:13256 mapped:4800 pagetables:6457 bounce:0
[  889.914305] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5672kB inactive_anon:6244kB active_file:16kB inactive_file:36kB unevictable:0kB present:15164kB pages_scanned:18758 all_unreclaimable? yes
[  889.933431] lowmem_reserve[]: 0 483 483 483
[  889.937757] Node 0 DMA32 free:2676kB min:2768kB low:3460kB high:4152kB active_anon:168600kB inactive_anon:182564kB active_file:6976kB inactive_file:10092kB unevictable:16kB present:495008kB pages_scanned:572756 all_unreclaimable? yes
[  889.958441] lowmem_reserve[]: 0 0 0 0
[  889.962251] Node 0 DMA: 19*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  889.973218] Node 0 DMA32: 85*4kB 8*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2676kB
[  889.984510] 63470 total pagecache pages
[  889.988363] 7128 pages in swap cache
[  889.991956] Swap cache stats: add 181169, delete 174041, find 15337/60473
[  889.998764] Free swap  = 795628kB
[  890.002089] Total swap = 1048568kB
[  890.012112] 131072 pages RAM
[  890.015034] 9628 pages reserved
[  890.018197] 73633 pages shared
[  890.021274] 58191 pages non-shared
[  890.024686] Out of memory: kill process 3314 (gnome-session) score 870770 or a child
[  890.032441] Killed process 3363 (firefox-bin)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-09 19:37   ` Johannes Weiner
@ 2009-06-10  6:39     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 55+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-10  6:39 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Tue, 9 Jun 2009 21:37:02 +0200
Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > [resend with lists cc'd, sorry]
> 
> [and fixed Hugh's email.  crap]
> 
> > Hi,
> > 
> > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > suggestion, I moved the pte collecting to the callsite and thus out
> > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > array of that many swap entries on the stack, but I think it is better
> > to limit the cluster size to a sane maximum than using dynamic
> > allocation for this purpose.
> > 
> > Thanks all for the helpful suggestions.  KAMEZAWA-san and Minchan, I
> > didn't incorporate your ideas in this patch as I think they belong in
> > a different one with their own justifications.  I didn't ignore them.
> > 
> >        Hannes
> > 
> > ---
> > The current swap readahead implementation reads a physically
> > contiguous group of swap slots around the faulting page to take
> > advantage of the disk head's position and in the hope that the
> > surrounding pages will be needed soon as well.
> > 
> > This works as long as the physical swap slot order approximates the
> > LRU order decently, otherwise it wastes memory and IO bandwidth to
> > read in pages that are unlikely to be needed soon.
> > 
> > However, the physical swap slot layout diverges from the LRU order
> > with increasing swap activity, i.e. high memory pressure situations,
> > and this is exactly the situation where swapin should not waste any
> > memory or IO bandwidth as both are the most contended resources at
> > this point.
> > 
> > Another approximation for LRU-relation is the VMA order as groups of
> > VMA-related pages are usually used together.
> > 
> > This patch combines both the physical and the virtual hint to get a
> > good approximation of pages that are sensible to read ahead.
> > 
> > When both diverge, we either read unrelated data, seek heavily for
> > related data, or, what this patch does, just decrease the readahead
> > efforts.
> > 
> > To achieve this, we have essentially two readahead windows of the same
> > size: one spans the virtual, the other one the physical neighborhood
> > of the faulting page.  We only read where both areas overlap.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Reviewed-by: Rik van Riel <riel@redhat.com>
> > Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > Cc: Andi Kleen <andi@firstfloor.org>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Cc: Minchan Kim <minchan.kim@gmail.com>
> > ---
> >  include/linux/swap.h |    4 ++-
> >  kernel/sysctl.c      |    7 ++++-
> >  mm/memory.c          |   55 +++++++++++++++++++++++++++++++++++++++++
> >  mm/shmem.c           |    4 +--
> >  mm/swap_state.c      |   67 ++++++++++++++++++++++++++++++++++++++-------------
> >  5 files changed, 116 insertions(+), 21 deletions(-)
> > 
> > version 3:
> >   o move pte selection to callee (per Hugh)
> >   o limit ra ptes to one pmd entry to avoid multiple
> >     locking/mapping of highptes (per Hugh)
> > 
> > version 2:
> >   o fall back to physical ra window for shmem
> >   o add documentation to the new ra algorithm (per Andrew)
> > 
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -327,27 +327,14 @@ struct page *read_swap_cache_async(swp_e
> >  	return found_page;
> >  }
> >  
> > -/**
> > - * swapin_readahead - swap in pages in hope we need them soon
> > - * @entry: swap entry of this memory
> > - * @gfp_mask: memory allocation flags
> > - * @vma: user vma this address belongs to
> > - * @addr: target address for mempolicy
> > - *
> > - * Returns the struct page for entry and addr, after queueing swapin.
> > - *
> > +/*
> >   * Primitive swap readahead code. We simply read an aligned block of
> >   * (1 << page_cluster) entries in the swap area. This method is chosen
> >   * because it doesn't cost us any seek time.  We also make sure to queue
> >   * the 'original' request together with the readahead ones...
> > - *
> > - * This has been extended to use the NUMA policies from the mm triggering
> > - * the readahead.
> > - *
> > - * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> >   */
> > -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > -			struct vm_area_struct *vma, unsigned long addr)
> > +static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
> > +				struct vm_area_struct *vma, unsigned long addr)
> >  {
> >  	int nr_pages;
> >  	struct page *page;
> > @@ -373,3 +360,51 @@ struct page *swapin_readahead(swp_entry_
> >  	lru_add_drain();	/* Push any new pages onto the LRU now */
> >  	return read_swap_cache_async(entry, gfp_mask, vma, addr);
> >  }
> > +
> > +/**
> > + * swapin_readahead - swap in pages in hope we need them soon
> > + * @entry: swap entry of this memory
> > + * @gfp_mask: memory allocation flags
> > + * @vma: user vma this address belongs to
> > + * @addr: target address for mempolicy
> > + * @entries: swap slots to consider reading
> > + * @nr_entries: number of @entries
> > + * @cluster: readahead window size in swap slots
> > + *
> > + * Returns the struct page for entry and addr, after queueing swapin.
> > + *
> > + * This has been extended to use the NUMA policies from the mm
> > + * triggering the readahead.
> > + *
> > + * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> > + */
> > +struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > +			struct vm_area_struct *vma, unsigned long addr,
> > +			swp_entry_t *entries, int nr_entries,
> > +			unsigned long cluster)
> > +{
> > +	unsigned long pmin, pmax;
> > +	int i;
> > +
> > +	if (!entries)	/* XXX: shmem case */
> > +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> > +	pmin = swp_offset(entry) & ~(cluster - 1);
> > +	pmax = pmin + cluster;
> > +	for (i = 0; i < nr_entries; i++) {
> > +		swp_entry_t swp = entries[i];
> > +		struct page *page;
> > +
> > +		if (swp_type(swp) != swp_type(entry))
> > +			continue;
> > +		if (swp_offset(swp) > pmax)
> > +			continue;
> > +		if (swp_offset(swp) < pmin)
> > +			continue;
> > +		page = read_swap_cache_async(swp, gfp_mask, vma, addr);
> > +		if (!page)
> > +			break;
> > +		page_cache_release(page);
> > +	}
> > +	lru_add_drain();	/* Push any new pages onto the LRU now */
> > +	return read_swap_cache_async(entry, gfp_mask, vma, addr);
> > +}
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -292,7 +292,9 @@ extern struct page *lookup_swap_cache(sw
> >  extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
> >  			struct vm_area_struct *vma, unsigned long addr);
> >  extern struct page *swapin_readahead(swp_entry_t, gfp_t,
> > -			struct vm_area_struct *vma, unsigned long addr);
> > +			struct vm_area_struct *vma, unsigned long addr,
> > +			swp_entry_t *entries, int nr_entries,
> > +			unsigned long cluster);
> >  
> >  /* linux/mm/swapfile.c */
> >  extern long nr_swap_pages;
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2440,6 +2440,54 @@ int vmtruncate_range(struct inode *inode
> >  }
> >  
> >  /*
> > + * The readahead window is the virtual area around the faulting page,
> > + * where the physical proximity of the swap slots is taken into
> > + * account as well in swapin_readahead().
> > + *
> > + * While the swap allocation algorithm tries to keep LRU-related pages
> > + * together on the swap backing, it is not reliable on heavy thrashing
> > + * systems where concurrent reclaimers allocate swap slots and/or most
> > + * anonymous memory pages are already in swap cache.
> > + *
> > + * On the virtual side, subgroups of VMA-related pages are usually
> > + * used together, which gives another hint to LRU relationship.
> > + *
> > + * By taking both aspects into account, we get a good approximation of
> > + * which pages are sensible to read together with the faulting one.
> > + */
> > +static int swap_readahead_ptes(struct mm_struct *mm,
> > +			unsigned long addr, pmd_t *pmd,
> > +			swp_entry_t *entries,
> > +			unsigned long cluster)
> > +{
> > +	unsigned long window, min, max, limit;
> > +	spinlock_t *ptl;
> > +	pte_t *ptep;
> > +	int i, nr;
> > +
> > +	window = cluster << PAGE_SHIFT;
> > +	min = addr & ~(window - 1);
> > +	max = min + cluster;

Hmm, max = min + window ?

Thanks,
-Kame

> > +	/*
> > +	 * To keep the locking/highpte mapping simple, stay
> > +	 * within the PTE range of one PMD entry.
> > +	 */
> > +	limit = addr & PMD_MASK;
> > +	if (limit > min)
> > +		min = limit;
> > +	limit = pmd_addr_end(addr, max);
> > +	if (limit < max)
> > +		max = limit;
> > +	limit = max - min;
> > +	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
> > +	for (i = nr = 0; i < limit; i++)
> > +		if (is_swap_pte(ptep[i]))
> > +			entries[nr++] = pte_to_swp_entry(ptep[i]);
> > +	pte_unmap_unlock(ptep, ptl);
> > +	return nr;
> > +}
> > +
> > +/*
> >   * We enter with non-exclusive mmap_sem (to exclude vma changes,
> >   * but allow concurrent faults), and pte mapped but not yet locked.
> >   * We return with mmap_sem still held, but pte unmapped and unlocked.
> > @@ -2466,9 +2514,14 @@ static int do_swap_page(struct mm_struct
> >  	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> >  	page = lookup_swap_cache(entry);
> >  	if (!page) {
> > +		int nr, cluster = 1 << page_cluster;
> > +		swp_entry_t entries[cluster];
> > +
> >  		grab_swap_token(); /* Contend for token _before_ read-in */
> > +		nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
> >  		page = swapin_readahead(entry,
> > -					GFP_HIGHUSER_MOVABLE, vma, address);
> > +					GFP_HIGHUSER_MOVABLE, vma, address,
> > +					entries, nr, cluster);
> >  		if (!page) {
> >  			/*
> >  			 * Back out if somebody else faulted in this pte
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1148,7 +1148,7 @@ static struct page *shmem_swapin(swp_ent
> >  	pvma.vm_pgoff = idx;
> >  	pvma.vm_ops = NULL;
> >  	pvma.vm_policy = spol;
> > -	page = swapin_readahead(entry, gfp, &pvma, 0);
> > +	page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
> >  	return page;
> >  }
> >  
> > @@ -1178,7 +1178,7 @@ static inline void shmem_show_mpol(struc
> >  static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
> >  			struct shmem_inode_info *info, unsigned long idx)
> >  {
> > -	return swapin_readahead(entry, gfp, NULL, 0);
> > +	return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
> >  }
> >  
> >  static inline struct page *shmem_alloc_page(gfp_t gfp,
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -112,6 +112,8 @@ static int min_percpu_pagelist_fract = 8
> >  
> >  static int ngroups_max = NGROUPS_MAX;
> >  
> > +static int page_cluster_max = 5;
> > +
> >  #ifdef CONFIG_MODULES
> >  extern char modprobe_path[];
> >  #endif
> > @@ -966,7 +968,10 @@ static struct ctl_table vm_table[] = {
> >  		.data		= &page_cluster,
> >  		.maxlen		= sizeof(int),
> >  		.mode		= 0644,
> > -		.proc_handler	= &proc_dointvec,
> > +		.proc_handler	= &proc_dointvec_minmax,
> > +		.strategy	= &sysctl_intvec,
> > +		.extra1		= &zero,
> > +		.extra2		= &page_cluster_max,
> >  	},
> >  	{
> >  		.ctl_name	= VM_DIRTY_BACKGROUND,
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10  6:39     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 55+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-10  6:39 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Tue, 9 Jun 2009 21:37:02 +0200
Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > [resend with lists cc'd, sorry]
> 
> [and fixed Hugh's email.  crap]
> 
> > Hi,
> > 
> > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > suggestion, I moved the pte collecting to the callsite and thus out
> > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > array of that many swap entries on the stack, but I think it is better
> > to limit the cluster size to a sane maximum than using dynamic
> > allocation for this purpose.
> > 
> > Thanks all for the helpful suggestions.  KAMEZAWA-san and Minchan, I
> > didn't incorporate your ideas in this patch as I think they belong in
> > a different one with their own justifications.  I didn't ignore them.
> > 
> >        Hannes
> > 
> > ---
> > The current swap readahead implementation reads a physically
> > contiguous group of swap slots around the faulting page to take
> > advantage of the disk head's position and in the hope that the
> > surrounding pages will be needed soon as well.
> > 
> > This works as long as the physical swap slot order approximates the
> > LRU order decently, otherwise it wastes memory and IO bandwidth to
> > read in pages that are unlikely to be needed soon.
> > 
> > However, the physical swap slot layout diverges from the LRU order
> > with increasing swap activity, i.e. high memory pressure situations,
> > and this is exactly the situation where swapin should not waste any
> > memory or IO bandwidth as both are the most contended resources at
> > this point.
> > 
> > Another approximation for LRU-relation is the VMA order as groups of
> > VMA-related pages are usually used together.
> > 
> > This patch combines both the physical and the virtual hint to get a
> > good approximation of pages that are sensible to read ahead.
> > 
> > When both diverge, we either read unrelated data, seek heavily for
> > related data, or, what this patch does, just decrease the readahead
> > efforts.
> > 
> > To achieve this, we have essentially two readahead windows of the same
> > size: one spans the virtual, the other one the physical neighborhood
> > of the faulting page.  We only read where both areas overlap.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Reviewed-by: Rik van Riel <riel@redhat.com>
> > Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > Cc: Andi Kleen <andi@firstfloor.org>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Cc: Minchan Kim <minchan.kim@gmail.com>
> > ---
> >  include/linux/swap.h |    4 ++-
> >  kernel/sysctl.c      |    7 ++++-
> >  mm/memory.c          |   55 +++++++++++++++++++++++++++++++++++++++++
> >  mm/shmem.c           |    4 +--
> >  mm/swap_state.c      |   67 ++++++++++++++++++++++++++++++++++++++-------------
> >  5 files changed, 116 insertions(+), 21 deletions(-)
> > 
> > version 3:
> >   o move pte selection to callee (per Hugh)
> >   o limit ra ptes to one pmd entry to avoid multiple
> >     locking/mapping of highptes (per Hugh)
> > 
> > version 2:
> >   o fall back to physical ra window for shmem
> >   o add documentation to the new ra algorithm (per Andrew)
> > 
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -327,27 +327,14 @@ struct page *read_swap_cache_async(swp_e
> >  	return found_page;
> >  }
> >  
> > -/**
> > - * swapin_readahead - swap in pages in hope we need them soon
> > - * @entry: swap entry of this memory
> > - * @gfp_mask: memory allocation flags
> > - * @vma: user vma this address belongs to
> > - * @addr: target address for mempolicy
> > - *
> > - * Returns the struct page for entry and addr, after queueing swapin.
> > - *
> > +/*
> >   * Primitive swap readahead code. We simply read an aligned block of
> >   * (1 << page_cluster) entries in the swap area. This method is chosen
> >   * because it doesn't cost us any seek time.  We also make sure to queue
> >   * the 'original' request together with the readahead ones...
> > - *
> > - * This has been extended to use the NUMA policies from the mm triggering
> > - * the readahead.
> > - *
> > - * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> >   */
> > -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > -			struct vm_area_struct *vma, unsigned long addr)
> > +static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
> > +				struct vm_area_struct *vma, unsigned long addr)
> >  {
> >  	int nr_pages;
> >  	struct page *page;
> > @@ -373,3 +360,51 @@ struct page *swapin_readahead(swp_entry_
> >  	lru_add_drain();	/* Push any new pages onto the LRU now */
> >  	return read_swap_cache_async(entry, gfp_mask, vma, addr);
> >  }
> > +
> > +/**
> > + * swapin_readahead - swap in pages in hope we need them soon
> > + * @entry: swap entry of this memory
> > + * @gfp_mask: memory allocation flags
> > + * @vma: user vma this address belongs to
> > + * @addr: target address for mempolicy
> > + * @entries: swap slots to consider reading
> > + * @nr_entries: number of @entries
> > + * @cluster: readahead window size in swap slots
> > + *
> > + * Returns the struct page for entry and addr, after queueing swapin.
> > + *
> > + * This has been extended to use the NUMA policies from the mm
> > + * triggering the readahead.
> > + *
> > + * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> > + */
> > +struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > +			struct vm_area_struct *vma, unsigned long addr,
> > +			swp_entry_t *entries, int nr_entries,
> > +			unsigned long cluster)
> > +{
> > +	unsigned long pmin, pmax;
> > +	int i;
> > +
> > +	if (!entries)	/* XXX: shmem case */
> > +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> > +	pmin = swp_offset(entry) & ~(cluster - 1);
> > +	pmax = pmin + cluster;
> > +	for (i = 0; i < nr_entries; i++) {
> > +		swp_entry_t swp = entries[i];
> > +		struct page *page;
> > +
> > +		if (swp_type(swp) != swp_type(entry))
> > +			continue;
> > +		if (swp_offset(swp) > pmax)
> > +			continue;
> > +		if (swp_offset(swp) < pmin)
> > +			continue;
> > +		page = read_swap_cache_async(swp, gfp_mask, vma, addr);
> > +		if (!page)
> > +			break;
> > +		page_cache_release(page);
> > +	}
> > +	lru_add_drain();	/* Push any new pages onto the LRU now */
> > +	return read_swap_cache_async(entry, gfp_mask, vma, addr);
> > +}
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -292,7 +292,9 @@ extern struct page *lookup_swap_cache(sw
> >  extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
> >  			struct vm_area_struct *vma, unsigned long addr);
> >  extern struct page *swapin_readahead(swp_entry_t, gfp_t,
> > -			struct vm_area_struct *vma, unsigned long addr);
> > +			struct vm_area_struct *vma, unsigned long addr,
> > +			swp_entry_t *entries, int nr_entries,
> > +			unsigned long cluster);
> >  
> >  /* linux/mm/swapfile.c */
> >  extern long nr_swap_pages;
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2440,6 +2440,54 @@ int vmtruncate_range(struct inode *inode
> >  }
> >  
> >  /*
> > + * The readahead window is the virtual area around the faulting page,
> > + * where the physical proximity of the swap slots is taken into
> > + * account as well in swapin_readahead().
> > + *
> > + * While the swap allocation algorithm tries to keep LRU-related pages
> > + * together on the swap backing, it is not reliable on heavy thrashing
> > + * systems where concurrent reclaimers allocate swap slots and/or most
> > + * anonymous memory pages are already in swap cache.
> > + *
> > + * On the virtual side, subgroups of VMA-related pages are usually
> > + * used together, which gives another hint to LRU relationship.
> > + *
> > + * By taking both aspects into account, we get a good approximation of
> > + * which pages are sensible to read together with the faulting one.
> > + */
> > +static int swap_readahead_ptes(struct mm_struct *mm,
> > +			unsigned long addr, pmd_t *pmd,
> > +			swp_entry_t *entries,
> > +			unsigned long cluster)
> > +{
> > +	unsigned long window, min, max, limit;
> > +	spinlock_t *ptl;
> > +	pte_t *ptep;
> > +	int i, nr;
> > +
> > +	window = cluster << PAGE_SHIFT;
> > +	min = addr & ~(window - 1);
> > +	max = min + cluster;

Hmm, max = min + window ?

Thanks,
-Kame

> > +	/*
> > +	 * To keep the locking/highpte mapping simple, stay
> > +	 * within the PTE range of one PMD entry.
> > +	 */
> > +	limit = addr & PMD_MASK;
> > +	if (limit > min)
> > +		min = limit;
> > +	limit = pmd_addr_end(addr, max);
> > +	if (limit < max)
> > +		max = limit;
> > +	limit = max - min;
> > +	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
> > +	for (i = nr = 0; i < limit; i++)
> > +		if (is_swap_pte(ptep[i]))
> > +			entries[nr++] = pte_to_swp_entry(ptep[i]);
> > +	pte_unmap_unlock(ptep, ptl);
> > +	return nr;
> > +}
> > +
> > +/*
> >   * We enter with non-exclusive mmap_sem (to exclude vma changes,
> >   * but allow concurrent faults), and pte mapped but not yet locked.
> >   * We return with mmap_sem still held, but pte unmapped and unlocked.
> > @@ -2466,9 +2514,14 @@ static int do_swap_page(struct mm_struct
> >  	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> >  	page = lookup_swap_cache(entry);
> >  	if (!page) {
> > +		int nr, cluster = 1 << page_cluster;
> > +		swp_entry_t entries[cluster];
> > +
> >  		grab_swap_token(); /* Contend for token _before_ read-in */
> > +		nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
> >  		page = swapin_readahead(entry,
> > -					GFP_HIGHUSER_MOVABLE, vma, address);
> > +					GFP_HIGHUSER_MOVABLE, vma, address,
> > +					entries, nr, cluster);
> >  		if (!page) {
> >  			/*
> >  			 * Back out if somebody else faulted in this pte
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1148,7 +1148,7 @@ static struct page *shmem_swapin(swp_ent
> >  	pvma.vm_pgoff = idx;
> >  	pvma.vm_ops = NULL;
> >  	pvma.vm_policy = spol;
> > -	page = swapin_readahead(entry, gfp, &pvma, 0);
> > +	page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
> >  	return page;
> >  }
> >  
> > @@ -1178,7 +1178,7 @@ static inline void shmem_show_mpol(struc
> >  static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
> >  			struct shmem_inode_info *info, unsigned long idx)
> >  {
> > -	return swapin_readahead(entry, gfp, NULL, 0);
> > +	return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
> >  }
> >  
> >  static inline struct page *shmem_alloc_page(gfp_t gfp,
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -112,6 +112,8 @@ static int min_percpu_pagelist_fract = 8
> >  
> >  static int ngroups_max = NGROUPS_MAX;
> >  
> > +static int page_cluster_max = 5;
> > +
> >  #ifdef CONFIG_MODULES
> >  extern char modprobe_path[];
> >  #endif
> > @@ -966,7 +968,10 @@ static struct ctl_table vm_table[] = {
> >  		.data		= &page_cluster,
> >  		.maxlen		= sizeof(int),
> >  		.mode		= 0644,
> > -		.proc_handler	= &proc_dointvec,
> > +		.proc_handler	= &proc_dointvec_minmax,
> > +		.strategy	= &sysctl_intvec,
> > +		.extra1		= &zero,
> > +		.extra2		= &page_cluster_max,
> >  	},
> >  	{
> >  		.ctl_name	= VM_DIRTY_BACKGROUND,
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10  5:03     ` Wu Fengguang
@ 2009-06-10  7:45       ` Johannes Weiner
  -1 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-10  7:45 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

Hi Fengguang,

On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > [resend with lists cc'd, sorry]
> > 
> > [and fixed Hugh's email.  crap]
> > 
> > > Hi,
> > > 
> > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > suggestion, I moved the pte collecting to the callsite and thus out
> > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > array of that many swap entries on the stack, but I think it is better
> > > to limit the cluster size to a sane maximum than using dynamic
> > > allocation for this purpose.
> 
> Hi Johannes,
> 
> When stress testing your patch, I found it triggered many OOM kills.
> Around the time of last OOMs, the memory usage is:
> 
>              total       used       free     shared    buffers     cached
> Mem:           474        468          5          0          0        239
> -/+ buffers/cache:        229        244
> Swap:         1023        221        802

Wow, that really confused me for a second as we shouldn't read more
pages ahead than without the patch, probably even less under stress.

So the problem has to be a runaway reading.  And indeed, severe
stupidity here:

+       window = cluster << PAGE_SHIFT;
+       min = addr & ~(window - 1);
+       max = min + cluster;
+       /*
+        * To keep the locking/highpte mapping simple, stay
+        * within the PTE range of one PMD entry.
+        */
+       limit = addr & PMD_MASK;
+       if (limit > min)
+               min = limit;
+       limit = pmd_addr_end(addr, max);
+       if (limit < max)
+               max = limit;
+       limit = max - min;

The mistake is at the initial calculation of max.  It should be

	max = min + window;

The resulting problem is that min could get bigger than max when
cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?

The initial min is aligned to a value below the PMD boundary and max
based on it with a too small offset, staying below the PMD boundary as
well.  When min is rounded up, this becomes a bit large:

	limit = max - min;

So if my brain is already functioning, fixing the initial max should
be enough because either

	o window is smaller than PMD_SIZE, than we won't round down
	below a PMD boundary in the first place or

	o window is bigger than PMD_SIZE, than we can round down below
	a PMD boundary but adding window to that is garuanteed to
	cross the boundary again

and thus max is always bigger than min.

Fengguang, does this make sense?  If so, the patch below should fix
it.

Thank you,

	Hannes

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2467,7 +2467,7 @@ static int swap_readahead_ptes(struct mm
 
 	window = cluster << PAGE_SHIFT;
 	min = addr & ~(window - 1);
-	max = min + cluster;
+	max = min + window;
 	/*
 	 * To keep the locking/highpte mapping simple, stay
 	 * within the PTE range of one PMD entry.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10  7:45       ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-10  7:45 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

Hi Fengguang,

On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > [resend with lists cc'd, sorry]
> > 
> > [and fixed Hugh's email.  crap]
> > 
> > > Hi,
> > > 
> > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > suggestion, I moved the pte collecting to the callsite and thus out
> > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > array of that many swap entries on the stack, but I think it is better
> > > to limit the cluster size to a sane maximum than using dynamic
> > > allocation for this purpose.
> 
> Hi Johannes,
> 
> When stress testing your patch, I found it triggered many OOM kills.
> Around the time of last OOMs, the memory usage is:
> 
>              total       used       free     shared    buffers     cached
> Mem:           474        468          5          0          0        239
> -/+ buffers/cache:        229        244
> Swap:         1023        221        802

Wow, that really confused me for a second as we shouldn't read more
pages ahead than without the patch, probably even less under stress.

So the problem has to be a runaway reading.  And indeed, severe
stupidity here:

+       window = cluster << PAGE_SHIFT;
+       min = addr & ~(window - 1);
+       max = min + cluster;
+       /*
+        * To keep the locking/highpte mapping simple, stay
+        * within the PTE range of one PMD entry.
+        */
+       limit = addr & PMD_MASK;
+       if (limit > min)
+               min = limit;
+       limit = pmd_addr_end(addr, max);
+       if (limit < max)
+               max = limit;
+       limit = max - min;

The mistake is at the initial calculation of max.  It should be

	max = min + window;

The resulting problem is that min could get bigger than max when
cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?

The initial min is aligned to a value below the PMD boundary and max
based on it with a too small offset, staying below the PMD boundary as
well.  When min is rounded up, this becomes a bit large:

	limit = max - min;

So if my brain is already functioning, fixing the initial max should
be enough because either

	o window is smaller than PMD_SIZE, than we won't round down
	below a PMD boundary in the first place or

	o window is bigger than PMD_SIZE, than we can round down below
	a PMD boundary but adding window to that is garuanteed to
	cross the boundary again

and thus max is always bigger than min.

Fengguang, does this make sense?  If so, the patch below should fix
it.

Thank you,

	Hannes

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2467,7 +2467,7 @@ static int swap_readahead_ptes(struct mm
 
 	window = cluster << PAGE_SHIFT;
 	min = addr & ~(window - 1);
-	max = min + cluster;
+	max = min + window;
 	/*
 	 * To keep the locking/highpte mapping simple, stay
 	 * within the PTE range of one PMD entry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10  7:45       ` Johannes Weiner
@ 2009-06-10  8:11         ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10  8:11 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> Hi Fengguang,
> 
> On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > [resend with lists cc'd, sorry]
> > > 
> > > [and fixed Hugh's email.  crap]
> > > 
> > > > Hi,
> > > > 
> > > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > > array of that many swap entries on the stack, but I think it is better
> > > > to limit the cluster size to a sane maximum than using dynamic
> > > > allocation for this purpose.
> > 
> > Hi Johannes,
> > 
> > When stress testing your patch, I found it triggered many OOM kills.
> > Around the time of last OOMs, the memory usage is:
> > 
> >              total       used       free     shared    buffers     cached
> > Mem:           474        468          5          0          0        239
> > -/+ buffers/cache:        229        244
> > Swap:         1023        221        802
> 
> Wow, that really confused me for a second as we shouldn't read more
> pages ahead than without the patch, probably even less under stress.

Yup - swap readahead is much more challenging than sequential readahead,
in that it must be accurate enough given some really obscure patterns.

> So the problem has to be a runaway reading.  And indeed, severe
> stupidity here:
> 
> +       window = cluster << PAGE_SHIFT;
> +       min = addr & ~(window - 1);
> +       max = min + cluster;
> +       /*
> +        * To keep the locking/highpte mapping simple, stay
> +        * within the PTE range of one PMD entry.
> +        */
> +       limit = addr & PMD_MASK;
> +       if (limit > min)
> +               min = limit;
> +       limit = pmd_addr_end(addr, max);
> +       if (limit < max)
> +               max = limit;
> +       limit = max - min;
> 
> The mistake is at the initial calculation of max.  It should be
> 
> 	max = min + window;
> 
> The resulting problem is that min could get bigger than max when
> cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?

No I use the default 3.

btw, the mistake reflects bad named variables. How about rename
        cluster => pages
        window  => bytes
?

> The initial min is aligned to a value below the PMD boundary and max
> based on it with a too small offset, staying below the PMD boundary as
> well.  When min is rounded up, this becomes a bit large:
> 
> 	limit = max - min;
> 
> So if my brain is already functioning, fixing the initial max should
> be enough because either
> 
> 	o window is smaller than PMD_SIZE, than we won't round down
> 	below a PMD boundary in the first place or
> 
> 	o window is bigger than PMD_SIZE, than we can round down below
> 	a PMD boundary but adding window to that is garuanteed to
> 	cross the boundary again
> 
> and thus max is always bigger than min.
> 
> Fengguang, does this make sense?  If so, the patch below should fix
> it.

Too bad, a quick test of the below patch freezes the box..

Thanks,
Fengguang

> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2467,7 +2467,7 @@ static int swap_readahead_ptes(struct mm
>  
>  	window = cluster << PAGE_SHIFT;
>  	min = addr & ~(window - 1);
> -	max = min + cluster;
> +	max = min + window;
>  	/*
>  	 * To keep the locking/highpte mapping simple, stay
>  	 * within the PTE range of one PMD entry.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10  8:11         ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10  8:11 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel

On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> Hi Fengguang,
> 
> On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > [resend with lists cc'd, sorry]
> > > 
> > > [and fixed Hugh's email.  crap]
> > > 
> > > > Hi,
> > > > 
> > > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > > array of that many swap entries on the stack, but I think it is better
> > > > to limit the cluster size to a sane maximum than using dynamic
> > > > allocation for this purpose.
> > 
> > Hi Johannes,
> > 
> > When stress testing your patch, I found it triggered many OOM kills.
> > Around the time of last OOMs, the memory usage is:
> > 
> >              total       used       free     shared    buffers     cached
> > Mem:           474        468          5          0          0        239
> > -/+ buffers/cache:        229        244
> > Swap:         1023        221        802
> 
> Wow, that really confused me for a second as we shouldn't read more
> pages ahead than without the patch, probably even less under stress.

Yup - swap readahead is much more challenging than sequential readahead,
in that it must be accurate enough given some really obscure patterns.

> So the problem has to be a runaway reading.  And indeed, severe
> stupidity here:
> 
> +       window = cluster << PAGE_SHIFT;
> +       min = addr & ~(window - 1);
> +       max = min + cluster;
> +       /*
> +        * To keep the locking/highpte mapping simple, stay
> +        * within the PTE range of one PMD entry.
> +        */
> +       limit = addr & PMD_MASK;
> +       if (limit > min)
> +               min = limit;
> +       limit = pmd_addr_end(addr, max);
> +       if (limit < max)
> +               max = limit;
> +       limit = max - min;
> 
> The mistake is at the initial calculation of max.  It should be
> 
> 	max = min + window;
> 
> The resulting problem is that min could get bigger than max when
> cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?

No I use the default 3.

btw, the mistake reflects bad named variables. How about rename
        cluster => pages
        window  => bytes
?

> The initial min is aligned to a value below the PMD boundary and max
> based on it with a too small offset, staying below the PMD boundary as
> well.  When min is rounded up, this becomes a bit large:
> 
> 	limit = max - min;
> 
> So if my brain is already functioning, fixing the initial max should
> be enough because either
> 
> 	o window is smaller than PMD_SIZE, than we won't round down
> 	below a PMD boundary in the first place or
> 
> 	o window is bigger than PMD_SIZE, than we can round down below
> 	a PMD boundary but adding window to that is garuanteed to
> 	cross the boundary again
> 
> and thus max is always bigger than min.
> 
> Fengguang, does this make sense?  If so, the patch below should fix
> it.

Too bad, a quick test of the below patch freezes the box..

Thanks,
Fengguang

> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2467,7 +2467,7 @@ static int swap_readahead_ptes(struct mm
>  
>  	window = cluster << PAGE_SHIFT;
>  	min = addr & ~(window - 1);
> -	max = min + cluster;
> +	max = min + window;
>  	/*
>  	 * To keep the locking/highpte mapping simple, stay
>  	 * within the PTE range of one PMD entry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10  8:11         ` Wu Fengguang
@ 2009-06-10  8:32           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 55+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-10  8:32 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Wed, 10 Jun 2009 16:11:32 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> > Hi Fengguang,
> > 
> > On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > > [resend with lists cc'd, sorry]
> > > > 
> > > > [and fixed Hugh's email.  crap]
> > > > 
> > > > > Hi,
> > > > > 
> > > > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > > > array of that many swap entries on the stack, but I think it is better
> > > > > to limit the cluster size to a sane maximum than using dynamic
> > > > > allocation for this purpose.
> > > 
> > > Hi Johannes,
> > > 
> > > When stress testing your patch, I found it triggered many OOM kills.
> > > Around the time of last OOMs, the memory usage is:
> > > 
> > >              total       used       free     shared    buffers     cached
> > > Mem:           474        468          5          0          0        239
> > > -/+ buffers/cache:        229        244
> > > Swap:         1023        221        802
> > 
> > Wow, that really confused me for a second as we shouldn't read more
> > pages ahead than without the patch, probably even less under stress.
> 
> Yup - swap readahead is much more challenging than sequential readahead,
> in that it must be accurate enough given some really obscure patterns.
> 
> > So the problem has to be a runaway reading.  And indeed, severe
> > stupidity here:
> > 
> > +       window = cluster << PAGE_SHIFT;
> > +       min = addr & ~(window - 1);
> > +       max = min + cluster;
> > +       /*
> > +        * To keep the locking/highpte mapping simple, stay
> > +        * within the PTE range of one PMD entry.
> > +        */
> > +       limit = addr & PMD_MASK;
> > +       if (limit > min)
> > +               min = limit;
> > +       limit = pmd_addr_end(addr, max);
> > +       if (limit < max)
> > +               max = limit;
> > +       limit = max - min;
> > 
> > The mistake is at the initial calculation of max.  It should be
> > 
> > 	max = min + window;
> > 
> > The resulting problem is that min could get bigger than max when
> > cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?
> 
> No I use the default 3.
> 
> btw, the mistake reflects bad named variables. How about rename
>         cluster => pages
>         window  => bytes
> ?
> 
> > The initial min is aligned to a value below the PMD boundary and max
> > based on it with a too small offset, staying below the PMD boundary as
> > well.  When min is rounded up, this becomes a bit large:
> > 
> > 	limit = max - min;
> > 
> > So if my brain is already functioning, fixing the initial max should
> > be enough because either
> > 
> > 	o window is smaller than PMD_SIZE, than we won't round down
> > 	below a PMD boundary in the first place or
> > 
> > 	o window is bigger than PMD_SIZE, than we can round down below
> > 	a PMD boundary but adding window to that is garuanteed to
> > 	cross the boundary again
> > 
> > and thus max is always bigger than min.
> > 
> > Fengguang, does this make sense?  If so, the patch below should fix
> > it.
> 
> Too bad, a quick test of the below patch freezes the box..
> 

+	window = cluster << PAGE_SHIFT;
+	min = addr & ~(window - 1);
+	max = min + cluster;

max = min + window; # this is fixed. then,

+	/*
+	 * To keep the locking/highpte mapping simple, stay
+	 * within the PTE range of one PMD entry.
+	 */
+	limit = addr & PMD_MASK;
+	if (limit > min)
+		min = limit;
+	limit = pmd_addr_end(addr, max);
+	if (limit < max)
+		max = limit;
+	limit = max - min;

limit = (max - min) >> PAGE_SHIFT;  

+	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
+	for (i = nr = 0; i < limit; i++)
+		if (is_swap_pte(ptep[i]))
+			entries[nr++] = pte_to_swp_entry(ptep[i]);
+	pte_unmap_unlock(ptep, ptl);

Cheer!,
-Kame



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10  8:32           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 55+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-10  8:32 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Wed, 10 Jun 2009 16:11:32 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> > Hi Fengguang,
> > 
> > On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > > [resend with lists cc'd, sorry]
> > > > 
> > > > [and fixed Hugh's email.  crap]
> > > > 
> > > > > Hi,
> > > > > 
> > > > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > > > array of that many swap entries on the stack, but I think it is better
> > > > > to limit the cluster size to a sane maximum than using dynamic
> > > > > allocation for this purpose.
> > > 
> > > Hi Johannes,
> > > 
> > > When stress testing your patch, I found it triggered many OOM kills.
> > > Around the time of last OOMs, the memory usage is:
> > > 
> > >              total       used       free     shared    buffers     cached
> > > Mem:           474        468          5          0          0        239
> > > -/+ buffers/cache:        229        244
> > > Swap:         1023        221        802
> > 
> > Wow, that really confused me for a second as we shouldn't read more
> > pages ahead than without the patch, probably even less under stress.
> 
> Yup - swap readahead is much more challenging than sequential readahead,
> in that it must be accurate enough given some really obscure patterns.
> 
> > So the problem has to be a runaway reading.  And indeed, severe
> > stupidity here:
> > 
> > +       window = cluster << PAGE_SHIFT;
> > +       min = addr & ~(window - 1);
> > +       max = min + cluster;
> > +       /*
> > +        * To keep the locking/highpte mapping simple, stay
> > +        * within the PTE range of one PMD entry.
> > +        */
> > +       limit = addr & PMD_MASK;
> > +       if (limit > min)
> > +               min = limit;
> > +       limit = pmd_addr_end(addr, max);
> > +       if (limit < max)
> > +               max = limit;
> > +       limit = max - min;
> > 
> > The mistake is at the initial calculation of max.  It should be
> > 
> > 	max = min + window;
> > 
> > The resulting problem is that min could get bigger than max when
> > cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?
> 
> No I use the default 3.
> 
> btw, the mistake reflects bad named variables. How about rename
>         cluster => pages
>         window  => bytes
> ?
> 
> > The initial min is aligned to a value below the PMD boundary and max
> > based on it with a too small offset, staying below the PMD boundary as
> > well.  When min is rounded up, this becomes a bit large:
> > 
> > 	limit = max - min;
> > 
> > So if my brain is already functioning, fixing the initial max should
> > be enough because either
> > 
> > 	o window is smaller than PMD_SIZE, than we won't round down
> > 	below a PMD boundary in the first place or
> > 
> > 	o window is bigger than PMD_SIZE, than we can round down below
> > 	a PMD boundary but adding window to that is garuanteed to
> > 	cross the boundary again
> > 
> > and thus max is always bigger than min.
> > 
> > Fengguang, does this make sense?  If so, the patch below should fix
> > it.
> 
> Too bad, a quick test of the below patch freezes the box..
> 

+	window = cluster << PAGE_SHIFT;
+	min = addr & ~(window - 1);
+	max = min + cluster;

max = min + window; # this is fixed. then,

+	/*
+	 * To keep the locking/highpte mapping simple, stay
+	 * within the PTE range of one PMD entry.
+	 */
+	limit = addr & PMD_MASK;
+	if (limit > min)
+		min = limit;
+	limit = pmd_addr_end(addr, max);
+	if (limit < max)
+		max = limit;
+	limit = max - min;

limit = (max - min) >> PAGE_SHIFT;  

+	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
+	for (i = nr = 0; i < limit; i++)
+		if (is_swap_pte(ptep[i]))
+			entries[nr++] = pte_to_swp_entry(ptep[i]);
+	pte_unmap_unlock(ptep, ptl);

Cheer!,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10  8:32           ` KAMEZAWA Hiroyuki
@ 2009-06-10  8:56             ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10  8:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Wed, Jun 10, 2009 at 04:32:49PM +0800, KAMEZAWA Hiroyuki wrote:
> On Wed, 10 Jun 2009 16:11:32 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> > > Hi Fengguang,
> > > 
> > > On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > > > [resend with lists cc'd, sorry]
> > > > > 
> > > > > [and fixed Hugh's email.  crap]
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > > > > array of that many swap entries on the stack, but I think it is better
> > > > > > to limit the cluster size to a sane maximum than using dynamic
> > > > > > allocation for this purpose.
> > > > 
> > > > Hi Johannes,
> > > > 
> > > > When stress testing your patch, I found it triggered many OOM kills.
> > > > Around the time of last OOMs, the memory usage is:
> > > > 
> > > >              total       used       free     shared    buffers     cached
> > > > Mem:           474        468          5          0          0        239
> > > > -/+ buffers/cache:        229        244
> > > > Swap:         1023        221        802
> > > 
> > > Wow, that really confused me for a second as we shouldn't read more
> > > pages ahead than without the patch, probably even less under stress.
> > 
> > Yup - swap readahead is much more challenging than sequential readahead,
> > in that it must be accurate enough given some really obscure patterns.
> > 
> > > So the problem has to be a runaway reading.  And indeed, severe
> > > stupidity here:
> > > 
> > > +       window = cluster << PAGE_SHIFT;
> > > +       min = addr & ~(window - 1);
> > > +       max = min + cluster;
> > > +       /*
> > > +        * To keep the locking/highpte mapping simple, stay
> > > +        * within the PTE range of one PMD entry.
> > > +        */
> > > +       limit = addr & PMD_MASK;
> > > +       if (limit > min)
> > > +               min = limit;
> > > +       limit = pmd_addr_end(addr, max);
> > > +       if (limit < max)
> > > +               max = limit;
> > > +       limit = max - min;
> > > 
> > > The mistake is at the initial calculation of max.  It should be
> > > 
> > > 	max = min + window;
> > > 
> > > The resulting problem is that min could get bigger than max when
> > > cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?
> > 
> > No I use the default 3.
> > 
> > btw, the mistake reflects bad named variables. How about rename
> >         cluster => pages
> >         window  => bytes
> > ?
> > 
> > > The initial min is aligned to a value below the PMD boundary and max
> > > based on it with a too small offset, staying below the PMD boundary as
> > > well.  When min is rounded up, this becomes a bit large:
> > > 
> > > 	limit = max - min;
> > > 
> > > So if my brain is already functioning, fixing the initial max should
> > > be enough because either
> > > 
> > > 	o window is smaller than PMD_SIZE, than we won't round down
> > > 	below a PMD boundary in the first place or
> > > 
> > > 	o window is bigger than PMD_SIZE, than we can round down below
> > > 	a PMD boundary but adding window to that is garuanteed to
> > > 	cross the boundary again
> > > 
> > > and thus max is always bigger than min.
> > > 
> > > Fengguang, does this make sense?  If so, the patch below should fix
> > > it.
> > 
> > Too bad, a quick test of the below patch freezes the box..
> > 
> 
> +	window = cluster << PAGE_SHIFT;
> +	min = addr & ~(window - 1);
> +	max = min + cluster;
> 
> max = min + window; # this is fixed. then,
> 
> +	/*
> +	 * To keep the locking/highpte mapping simple, stay
> +	 * within the PTE range of one PMD entry.
> +	 */
> +	limit = addr & PMD_MASK;
> +	if (limit > min)
> +		min = limit;
> +	limit = pmd_addr_end(addr, max);
> +	if (limit < max)
> +		max = limit;
> +	limit = max - min;
> 
> limit = (max - min) >> PAGE_SHIFT;  
> 
> +	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
> +	for (i = nr = 0; i < limit; i++)
> +		if (is_swap_pte(ptep[i]))
> +			entries[nr++] = pte_to_swp_entry(ptep[i]);
> +	pte_unmap_unlock(ptep, ptl);

Yes it worked!  But then I run into page allocation failures:

[  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
[  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  340.651839] Call Trace:
[  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
[  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  340.685527]  [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[  340.691356]  [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[  340.696771]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  340.702518]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  340.709301]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  340.714529]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  340.719578]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  340.724969]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  340.730024]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  340.736375]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  340.741430]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  340.747434] Mem-Info:
[  340.749730] Node 0 DMA per-cpu:
[  340.752896] CPU    0: hi:    0, btch:   1 usd:   0
[  340.757679] CPU    1: hi:    0, btch:   1 usd:   0
[  340.762462] Node 0 DMA32 per-cpu:
[  340.765797] CPU    0: hi:  186, btch:  31 usd: 161
[  340.770582] CPU    1: hi:  186, btch:  31 usd:   0
[  340.775367] Active_anon:38344 active_file:6556 inactive_anon:41644
[  340.775368]  inactive_file:4210 unevictable:4 dirty:1 writeback:10 unstable:1
[  340.775370]  free:3136 slab:15738 mapped:8023 pagetables:6294 bounce:0
[  340.795166] Node 0 DMA free:2024kB min:84kB low:104kB high:124kB active_anon:5296kB inactive_anon:5772kB active_file:644kB inactive_file:612kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  340.814007] lowmem_reserve[]: 0 483 483 483
[  340.818277] Node 0 DMA32 free:10520kB min:2768kB low:3460kB high:4152kB active_anon:148080kB inactive_anon:160804kB active_file:25580kB inactive_file:16228kB unevictable:16kB present:495008kB pages_scanned:0 all_unreclaimable? no
[  340.838594] lowmem_reserve[]: 0 0 0 0
[  340.842338] Node 0 DMA: 87*4kB 14*8kB 2*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2028kB
[  340.853398] Node 0 DMA32: 2288*4kB 24*8kB 4*16kB 2*32kB 3*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 10560kB
[  340.864874] 59895 total pagecache pages
[  340.868720] 4176 pages in swap cache
[  340.872315] Swap cache stats: add 99021, delete 94845, find 8313/23463
[  340.878847] Free swap  = 780376kB
[  340.882178] Total swap = 1048568kB
[  340.889619] 131072 pages RAM
[  340.892527] 9628 pages reserved
[  340.895677] 126767 pages shared
[  340.898836] 60472 pages non-shared
[  341.026977] Xorg: page allocation failure. order:4, mode:0x40d0
[  341.032900] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  341.038989] Call Trace:
[  341.041451]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  341.047801]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  341.053628]  [<ffffffff810f7840>] __remote_slab_alloc_node+0xc0/0x130
[  341.060073]  [<ffffffff810f78e5>] __remote_slab_alloc+0x35/0xc0
[  341.065983]  [<ffffffff810f76e4>] ? __slab_alloc_page+0x314/0x3b0
[  341.072070]  [<ffffffff810f8528>] __kmalloc+0xb8/0x250
[  341.077220]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  341.084184]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  341.090963]  [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[  341.096791]  [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[  341.102197]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  341.107948]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  341.114726]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  341.119948]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  341.124996]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  341.130389]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  341.135436]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  341.141787]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  341.146848]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  341.152855] Mem-Info:
[  341.155124] Node 0 DMA per-cpu:
[  341.158289] CPU    0: hi:    0, btch:   1 usd:   0
[  341.163074] CPU    1: hi:    0, btch:   1 usd:   0
[  341.167878] Node 0 DMA32 per-cpu:
[  341.171212] CPU    0: hi:  186, btch:  31 usd:  72
[  341.176009] CPU    1: hi:  186, btch:  31 usd:   0
[  341.180794] Active_anon:38344 active_file:6605 inactive_anon:41579
[  341.180795]  inactive_file:4180 unevictable:4 dirty:0 writeback:0 unstable:1
[  341.180797]  free:3147 slab:15867 mapped:8021 pagetables:6295 bounce:0
[  341.200505] Node 0 DMA free:2028kB min:84kB low:104kB high:124kB active_anon:5284kB inactive_anon:5784kB active_file:644kB inactive_file:612kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  341.219339] lowmem_reserve[]: 0 483 483 483
[  341.223605] Node 0 DMA32 free:10560kB min:2768kB low:3460kB high:4152kB active_anon:148092kB inactive_anon:160532kB active_file:25776kB inactive_file:16108kB unevictable:16kB present:495008kB pages_scanned:618 all_unreclaimable? no
[  341.244093] lowmem_reserve[]: 0 0 0 0
[  341.247851] Node 0 DMA: 87*4kB 14*8kB 2*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2028kB
[  341.258769] Node 0 DMA32: 2296*4kB 18*8kB 5*16kB 2*32kB 3*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 10560kB
[  341.270121] 59860 total pagecache pages
[  341.273957] 4142 pages in swap cache
[  341.277531] Swap cache stats: add 99071, delete 94929, find 8313/23465
[  341.284052] Free swap  = 780184kB
[  341.287357] Total swap = 1048568kB
[  341.294497] 131072 pages RAM
[  341.297396] 9628 pages reserved
[  341.300538] 126655 pages shared
[  341.303674] 60501 pages non-shared
[  357.833157] Xorg: page allocation failure. order:4, mode:0x40d0
[  357.839105] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  357.845243] Call Trace:
[  357.847737]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  357.854108]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  357.859965]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
[  357.865263]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  357.872245]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  357.879029]  [<ffffffff810ea8bb>] ? swap_info_get+0x6b/0xf0
[  357.884626]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  357.890396]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  357.897190]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  357.902412]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  357.907460]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  357.912873]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  357.917923]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  357.924289]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  357.929347]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  357.935350] Mem-Info:
[  357.937630] Node 0 DMA per-cpu:
[  357.940801] CPU    0: hi:    0, btch:   1 usd:   0
[  357.945590] CPU    1: hi:    0, btch:   1 usd:   0
[  357.950379] Node 0 DMA32 per-cpu:
[  357.953728] CPU    0: hi:  186, btch:  31 usd: 159
[  357.958513] CPU    1: hi:  186, btch:  31 usd:   0
[  357.963300] Active_anon:38863 active_file:6095 inactive_anon:41764
[  357.963301]  inactive_file:4777 unevictable:4 dirty:0 writeback:18 unstable:0
[  357.963302]  free:2317 slab:15674 mapped:8121 pagetables:6408 bounce:0
[  357.983105] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5268kB inactive_anon:5768kB active_file:644kB inactive_file:632kB unevictable:0kB present:15164kB pages_scanned:65 all_unreclaimable? no
[  358.002033] lowmem_reserve[]: 0 483 483 483
[  358.006331] Node 0 DMA32 free:7380kB min:2768kB low:3460kB high:4152kB active_anon:150124kB inactive_anon:161368kB active_file:23736kB inactive_file:18404kB unevictable:16kB present:495008kB pages_scanned:32 all_unreclaimable? no
[  358.026802] lowmem_reserve[]: 0 0 0 0
[  358.030561] Node 0 DMA: 81*4kB 11*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  358.041571] Node 0 DMA32: 1534*4kB 29*8kB 3*16kB 4*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 7504kB
[  358.052856] 60223 total pagecache pages
[  358.056690] 4367 pages in swap cache
[  358.060265] Swap cache stats: add 105056, delete 100689, find 9043/26609
[  358.066954] Free swap  = 774800kB
[  358.070268] Total swap = 1048568kB
[  358.077041] 131072 pages RAM
[  358.079954] 9628 pages reserved
[  358.083094] 128803 pages shared
[  358.086237] 61031 pages non-shared
[  507.741934] Xorg: page allocation failure. order:4, mode:0x40d0
[  507.748019] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  507.754182] Call Trace:
[  507.756636]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  507.762988]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  507.768812]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
[  507.774048]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  507.781010]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  507.787798]  [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[  507.793636]  [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[  507.799043]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  507.804788]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  507.811572]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  507.816788]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  507.821847]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  507.827244]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  507.832291]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  507.838642]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  507.843696]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  507.849699] Mem-Info:
[  507.851973] Node 0 DMA per-cpu:
[  507.855130] CPU    0: hi:    0, btch:   1 usd:   0
[  507.859916] CPU    1: hi:    0, btch:   1 usd:   0
[  507.864700] Node 0 DMA32 per-cpu:
[  507.868036] CPU    0: hi:  186, btch:  31 usd:   0
[  507.872819] CPU    1: hi:  186, btch:  31 usd:  30
[  507.876816] Active_anon:34956 active_file:5472 inactive_anon:45220
[  507.876816]  inactive_file:6158 unevictable:4 dirty:13 writeback:2 unstable:0
[  507.876816]  free:1726 slab:15603 mapped:7450 pagetables:6818 bounce:0
[  507.897413] Node 0 DMA free:2044kB min:84kB low:104kB high:124kB active_anon:5060kB inactive_anon:6028kB active_file:644kB inactive_file:624kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  507.916249] lowmem_reserve[]: 0 483 483 483
[  507.920598] Node 0 DMA32 free:4488kB min:2768kB low:3460kB high:4152kB active_anon:134764kB inactive_anon:174852kB active_file:21244kB inactive_file:24008kB unevictable:16kB present:495008kB pages_scanned:0 all_unreclaimable? no
[  507.940856] lowmem_reserve[]: 0 0 0 0
[  507.944849] Node 0 DMA: 51*4kB 14*8kB 0*16kB 4*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2044kB
[  507.955772] Node 0 DMA32: 888*4kB 1*8kB 0*16kB 3*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4488kB
[  507.966871] 64772 total pagecache pages
[  507.970702] 6574 pages in swap cache
[  507.974276] Swap cache stats: add 161629, delete 155055, find 17122/59120
[  507.981051] Free swap  = 735792kB
[  507.984361] Total swap = 1048568kB
[  507.991453] 131072 pages RAM
[  507.994364] 9628 pages reserved
[  507.997503] 114413 pages shared
[  508.000643] 59801 pages non-shared
[  509.462416] NFS: Server wrote zero bytes, expected 756.
[  580.369464] Xorg: page allocation failure. order:4, mode:0x40d0
[  580.375400] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  580.381522] Call Trace:
[  580.384092]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  580.390669]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  580.396802]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
[  580.402033]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  580.408992]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  580.415775]  [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[  580.421607]  [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[  580.427033]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  580.432804]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  580.439600]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  580.444824]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  580.449889]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  580.455287]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  580.460353]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  580.466702]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  580.471751]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  580.477753] Mem-Info:
[  580.480020] Node 0 DMA per-cpu:
[  580.483189] CPU    0: hi:    0, btch:   1 usd:   0
[  580.487977] CPU    1: hi:    0, btch:   1 usd:   0
[  580.492767] Node 0 DMA32 per-cpu:
[  580.496095] CPU    0: hi:  186, btch:  31 usd:  90
[  580.500892] CPU    1: hi:  186, btch:  31 usd:   1
[  580.505679] Active_anon:34315 active_file:5739 inactive_anon:45597
[  580.505681]  inactive_file:5830 unevictable:4 dirty:2 writeback:0 unstable:1
[  580.505682]  free:3781 slab:13422 mapped:6830 pagetables:7180 bounce:0
[  580.525398] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5024kB inactive_anon:6012kB active_file:640kB inactive_file:608kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  580.544234] lowmem_reserve[]: 0 483 483 483
[  580.548504] Node 0 DMA32 free:13108kB min:2768kB low:3460kB high:4152kB active_anon:132236kB inactive_anon:176376kB active_file:22316kB inactive_file:22712kB unevictable:16kB present:495008kB pages_scanned:417 all_unreclaimable? no
[  580.568992] lowmem_reserve[]: 0 0 0 0
[  580.572741] Node 0 DMA: 56*4kB 22*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  580.583661] Node 0 DMA32: 2995*4kB 23*8kB 1*16kB 1*32kB 4*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13108kB
[  580.595010] 64782 total pagecache pages
[  580.598845] 6586 pages in swap cache
[  580.602421] Swap cache stats: add 185372, delete 178786, find 19755/72917
[  580.609205] Free swap  = 722720kB
[  580.612513] Total swap = 1048568kB
[  580.619688] 131072 pages RAM
[  580.622586] 9628 pages reserved
[  580.625726] 112220 pages shared
[  580.628868] 58034 pages non-shared

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10  8:56             ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10  8:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Wed, Jun 10, 2009 at 04:32:49PM +0800, KAMEZAWA Hiroyuki wrote:
> On Wed, 10 Jun 2009 16:11:32 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> > > Hi Fengguang,
> > > 
> > > On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > > > [resend with lists cc'd, sorry]
> > > > > 
> > > > > [and fixed Hugh's email.  crap]
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > > > > array of that many swap entries on the stack, but I think it is better
> > > > > > to limit the cluster size to a sane maximum than using dynamic
> > > > > > allocation for this purpose.
> > > > 
> > > > Hi Johannes,
> > > > 
> > > > When stress testing your patch, I found it triggered many OOM kills.
> > > > Around the time of last OOMs, the memory usage is:
> > > > 
> > > >              total       used       free     shared    buffers     cached
> > > > Mem:           474        468          5          0          0        239
> > > > -/+ buffers/cache:        229        244
> > > > Swap:         1023        221        802
> > > 
> > > Wow, that really confused me for a second as we shouldn't read more
> > > pages ahead than without the patch, probably even less under stress.
> > 
> > Yup - swap readahead is much more challenging than sequential readahead,
> > in that it must be accurate enough given some really obscure patterns.
> > 
> > > So the problem has to be a runaway reading.  And indeed, severe
> > > stupidity here:
> > > 
> > > +       window = cluster << PAGE_SHIFT;
> > > +       min = addr & ~(window - 1);
> > > +       max = min + cluster;
> > > +       /*
> > > +        * To keep the locking/highpte mapping simple, stay
> > > +        * within the PTE range of one PMD entry.
> > > +        */
> > > +       limit = addr & PMD_MASK;
> > > +       if (limit > min)
> > > +               min = limit;
> > > +       limit = pmd_addr_end(addr, max);
> > > +       if (limit < max)
> > > +               max = limit;
> > > +       limit = max - min;
> > > 
> > > The mistake is at the initial calculation of max.  It should be
> > > 
> > > 	max = min + window;
> > > 
> > > The resulting problem is that min could get bigger than max when
> > > cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?
> > 
> > No I use the default 3.
> > 
> > btw, the mistake reflects bad named variables. How about rename
> >         cluster => pages
> >         window  => bytes
> > ?
> > 
> > > The initial min is aligned to a value below the PMD boundary and max
> > > based on it with a too small offset, staying below the PMD boundary as
> > > well.  When min is rounded up, this becomes a bit large:
> > > 
> > > 	limit = max - min;
> > > 
> > > So if my brain is already functioning, fixing the initial max should
> > > be enough because either
> > > 
> > > 	o window is smaller than PMD_SIZE, than we won't round down
> > > 	below a PMD boundary in the first place or
> > > 
> > > 	o window is bigger than PMD_SIZE, than we can round down below
> > > 	a PMD boundary but adding window to that is garuanteed to
> > > 	cross the boundary again
> > > 
> > > and thus max is always bigger than min.
> > > 
> > > Fengguang, does this make sense?  If so, the patch below should fix
> > > it.
> > 
> > Too bad, a quick test of the below patch freezes the box..
> > 
> 
> +	window = cluster << PAGE_SHIFT;
> +	min = addr & ~(window - 1);
> +	max = min + cluster;
> 
> max = min + window; # this is fixed. then,
> 
> +	/*
> +	 * To keep the locking/highpte mapping simple, stay
> +	 * within the PTE range of one PMD entry.
> +	 */
> +	limit = addr & PMD_MASK;
> +	if (limit > min)
> +		min = limit;
> +	limit = pmd_addr_end(addr, max);
> +	if (limit < max)
> +		max = limit;
> +	limit = max - min;
> 
> limit = (max - min) >> PAGE_SHIFT;  
> 
> +	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
> +	for (i = nr = 0; i < limit; i++)
> +		if (is_swap_pte(ptep[i]))
> +			entries[nr++] = pte_to_swp_entry(ptep[i]);
> +	pte_unmap_unlock(ptep, ptl);

Yes it worked!  But then I run into page allocation failures:

[  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
[  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  340.651839] Call Trace:
[  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
[  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  340.685527]  [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[  340.691356]  [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[  340.696771]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  340.702518]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  340.709301]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  340.714529]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  340.719578]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  340.724969]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  340.730024]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  340.736375]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  340.741430]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  340.747434] Mem-Info:
[  340.749730] Node 0 DMA per-cpu:
[  340.752896] CPU    0: hi:    0, btch:   1 usd:   0
[  340.757679] CPU    1: hi:    0, btch:   1 usd:   0
[  340.762462] Node 0 DMA32 per-cpu:
[  340.765797] CPU    0: hi:  186, btch:  31 usd: 161
[  340.770582] CPU    1: hi:  186, btch:  31 usd:   0
[  340.775367] Active_anon:38344 active_file:6556 inactive_anon:41644
[  340.775368]  inactive_file:4210 unevictable:4 dirty:1 writeback:10 unstable:1
[  340.775370]  free:3136 slab:15738 mapped:8023 pagetables:6294 bounce:0
[  340.795166] Node 0 DMA free:2024kB min:84kB low:104kB high:124kB active_anon:5296kB inactive_anon:5772kB active_file:644kB inactive_file:612kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  340.814007] lowmem_reserve[]: 0 483 483 483
[  340.818277] Node 0 DMA32 free:10520kB min:2768kB low:3460kB high:4152kB active_anon:148080kB inactive_anon:160804kB active_file:25580kB inactive_file:16228kB unevictable:16kB present:495008kB pages_scanned:0 all_unreclaimable? no
[  340.838594] lowmem_reserve[]: 0 0 0 0
[  340.842338] Node 0 DMA: 87*4kB 14*8kB 2*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2028kB
[  340.853398] Node 0 DMA32: 2288*4kB 24*8kB 4*16kB 2*32kB 3*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 10560kB
[  340.864874] 59895 total pagecache pages
[  340.868720] 4176 pages in swap cache
[  340.872315] Swap cache stats: add 99021, delete 94845, find 8313/23463
[  340.878847] Free swap  = 780376kB
[  340.882178] Total swap = 1048568kB
[  340.889619] 131072 pages RAM
[  340.892527] 9628 pages reserved
[  340.895677] 126767 pages shared
[  340.898836] 60472 pages non-shared
[  341.026977] Xorg: page allocation failure. order:4, mode:0x40d0
[  341.032900] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  341.038989] Call Trace:
[  341.041451]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  341.047801]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  341.053628]  [<ffffffff810f7840>] __remote_slab_alloc_node+0xc0/0x130
[  341.060073]  [<ffffffff810f78e5>] __remote_slab_alloc+0x35/0xc0
[  341.065983]  [<ffffffff810f76e4>] ? __slab_alloc_page+0x314/0x3b0
[  341.072070]  [<ffffffff810f8528>] __kmalloc+0xb8/0x250
[  341.077220]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  341.084184]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  341.090963]  [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[  341.096791]  [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[  341.102197]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  341.107948]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  341.114726]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  341.119948]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  341.124996]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  341.130389]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  341.135436]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  341.141787]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  341.146848]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  341.152855] Mem-Info:
[  341.155124] Node 0 DMA per-cpu:
[  341.158289] CPU    0: hi:    0, btch:   1 usd:   0
[  341.163074] CPU    1: hi:    0, btch:   1 usd:   0
[  341.167878] Node 0 DMA32 per-cpu:
[  341.171212] CPU    0: hi:  186, btch:  31 usd:  72
[  341.176009] CPU    1: hi:  186, btch:  31 usd:   0
[  341.180794] Active_anon:38344 active_file:6605 inactive_anon:41579
[  341.180795]  inactive_file:4180 unevictable:4 dirty:0 writeback:0 unstable:1
[  341.180797]  free:3147 slab:15867 mapped:8021 pagetables:6295 bounce:0
[  341.200505] Node 0 DMA free:2028kB min:84kB low:104kB high:124kB active_anon:5284kB inactive_anon:5784kB active_file:644kB inactive_file:612kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  341.219339] lowmem_reserve[]: 0 483 483 483
[  341.223605] Node 0 DMA32 free:10560kB min:2768kB low:3460kB high:4152kB active_anon:148092kB inactive_anon:160532kB active_file:25776kB inactive_file:16108kB unevictable:16kB present:495008kB pages_scanned:618 all_unreclaimable? no
[  341.244093] lowmem_reserve[]: 0 0 0 0
[  341.247851] Node 0 DMA: 87*4kB 14*8kB 2*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2028kB
[  341.258769] Node 0 DMA32: 2296*4kB 18*8kB 5*16kB 2*32kB 3*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 10560kB
[  341.270121] 59860 total pagecache pages
[  341.273957] 4142 pages in swap cache
[  341.277531] Swap cache stats: add 99071, delete 94929, find 8313/23465
[  341.284052] Free swap  = 780184kB
[  341.287357] Total swap = 1048568kB
[  341.294497] 131072 pages RAM
[  341.297396] 9628 pages reserved
[  341.300538] 126655 pages shared
[  341.303674] 60501 pages non-shared
[  357.833157] Xorg: page allocation failure. order:4, mode:0x40d0
[  357.839105] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  357.845243] Call Trace:
[  357.847737]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  357.854108]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  357.859965]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
[  357.865263]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  357.872245]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  357.879029]  [<ffffffff810ea8bb>] ? swap_info_get+0x6b/0xf0
[  357.884626]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  357.890396]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  357.897190]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  357.902412]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  357.907460]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  357.912873]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  357.917923]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  357.924289]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  357.929347]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  357.935350] Mem-Info:
[  357.937630] Node 0 DMA per-cpu:
[  357.940801] CPU    0: hi:    0, btch:   1 usd:   0
[  357.945590] CPU    1: hi:    0, btch:   1 usd:   0
[  357.950379] Node 0 DMA32 per-cpu:
[  357.953728] CPU    0: hi:  186, btch:  31 usd: 159
[  357.958513] CPU    1: hi:  186, btch:  31 usd:   0
[  357.963300] Active_anon:38863 active_file:6095 inactive_anon:41764
[  357.963301]  inactive_file:4777 unevictable:4 dirty:0 writeback:18 unstable:0
[  357.963302]  free:2317 slab:15674 mapped:8121 pagetables:6408 bounce:0
[  357.983105] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5268kB inactive_anon:5768kB active_file:644kB inactive_file:632kB unevictable:0kB present:15164kB pages_scanned:65 all_unreclaimable? no
[  358.002033] lowmem_reserve[]: 0 483 483 483
[  358.006331] Node 0 DMA32 free:7380kB min:2768kB low:3460kB high:4152kB active_anon:150124kB inactive_anon:161368kB active_file:23736kB inactive_file:18404kB unevictable:16kB present:495008kB pages_scanned:32 all_unreclaimable? no
[  358.026802] lowmem_reserve[]: 0 0 0 0
[  358.030561] Node 0 DMA: 81*4kB 11*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  358.041571] Node 0 DMA32: 1534*4kB 29*8kB 3*16kB 4*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 7504kB
[  358.052856] 60223 total pagecache pages
[  358.056690] 4367 pages in swap cache
[  358.060265] Swap cache stats: add 105056, delete 100689, find 9043/26609
[  358.066954] Free swap  = 774800kB
[  358.070268] Total swap = 1048568kB
[  358.077041] 131072 pages RAM
[  358.079954] 9628 pages reserved
[  358.083094] 128803 pages shared
[  358.086237] 61031 pages non-shared
[  507.741934] Xorg: page allocation failure. order:4, mode:0x40d0
[  507.748019] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  507.754182] Call Trace:
[  507.756636]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  507.762988]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  507.768812]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
[  507.774048]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  507.781010]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  507.787798]  [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[  507.793636]  [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[  507.799043]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  507.804788]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  507.811572]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  507.816788]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  507.821847]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  507.827244]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  507.832291]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  507.838642]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  507.843696]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  507.849699] Mem-Info:
[  507.851973] Node 0 DMA per-cpu:
[  507.855130] CPU    0: hi:    0, btch:   1 usd:   0
[  507.859916] CPU    1: hi:    0, btch:   1 usd:   0
[  507.864700] Node 0 DMA32 per-cpu:
[  507.868036] CPU    0: hi:  186, btch:  31 usd:   0
[  507.872819] CPU    1: hi:  186, btch:  31 usd:  30
[  507.876816] Active_anon:34956 active_file:5472 inactive_anon:45220
[  507.876816]  inactive_file:6158 unevictable:4 dirty:13 writeback:2 unstable:0
[  507.876816]  free:1726 slab:15603 mapped:7450 pagetables:6818 bounce:0
[  507.897413] Node 0 DMA free:2044kB min:84kB low:104kB high:124kB active_anon:5060kB inactive_anon:6028kB active_file:644kB inactive_file:624kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  507.916249] lowmem_reserve[]: 0 483 483 483
[  507.920598] Node 0 DMA32 free:4488kB min:2768kB low:3460kB high:4152kB active_anon:134764kB inactive_anon:174852kB active_file:21244kB inactive_file:24008kB unevictable:16kB present:495008kB pages_scanned:0 all_unreclaimable? no
[  507.940856] lowmem_reserve[]: 0 0 0 0
[  507.944849] Node 0 DMA: 51*4kB 14*8kB 0*16kB 4*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2044kB
[  507.955772] Node 0 DMA32: 888*4kB 1*8kB 0*16kB 3*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4488kB
[  507.966871] 64772 total pagecache pages
[  507.970702] 6574 pages in swap cache
[  507.974276] Swap cache stats: add 161629, delete 155055, find 17122/59120
[  507.981051] Free swap  = 735792kB
[  507.984361] Total swap = 1048568kB
[  507.991453] 131072 pages RAM
[  507.994364] 9628 pages reserved
[  507.997503] 114413 pages shared
[  508.000643] 59801 pages non-shared
[  509.462416] NFS: Server wrote zero bytes, expected 756.
[  580.369464] Xorg: page allocation failure. order:4, mode:0x40d0
[  580.375400] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[  580.381522] Call Trace:
[  580.384092]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[  580.390669]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[  580.396802]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
[  580.402033]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  580.408992]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[  580.415775]  [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[  580.421607]  [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[  580.427033]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  580.432804]  [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[  580.439600]  [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[  580.444824]  [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[  580.449889]  [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[  580.455287]  [<ffffffff8106b236>] ? up_read+0x26/0x30
[  580.460353]  [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[  580.466702]  [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[  580.471751]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  580.477753] Mem-Info:
[  580.480020] Node 0 DMA per-cpu:
[  580.483189] CPU    0: hi:    0, btch:   1 usd:   0
[  580.487977] CPU    1: hi:    0, btch:   1 usd:   0
[  580.492767] Node 0 DMA32 per-cpu:
[  580.496095] CPU    0: hi:  186, btch:  31 usd:  90
[  580.500892] CPU    1: hi:  186, btch:  31 usd:   1
[  580.505679] Active_anon:34315 active_file:5739 inactive_anon:45597
[  580.505681]  inactive_file:5830 unevictable:4 dirty:2 writeback:0 unstable:1
[  580.505682]  free:3781 slab:13422 mapped:6830 pagetables:7180 bounce:0
[  580.525398] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5024kB inactive_anon:6012kB active_file:640kB inactive_file:608kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  580.544234] lowmem_reserve[]: 0 483 483 483
[  580.548504] Node 0 DMA32 free:13108kB min:2768kB low:3460kB high:4152kB active_anon:132236kB inactive_anon:176376kB active_file:22316kB inactive_file:22712kB unevictable:16kB present:495008kB pages_scanned:417 all_unreclaimable? no
[  580.568992] lowmem_reserve[]: 0 0 0 0
[  580.572741] Node 0 DMA: 56*4kB 22*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  580.583661] Node 0 DMA32: 2995*4kB 23*8kB 1*16kB 1*32kB 4*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13108kB
[  580.595010] 64782 total pagecache pages
[  580.598845] 6586 pages in swap cache
[  580.602421] Swap cache stats: add 185372, delete 178786, find 19755/72917
[  580.609205] Free swap  = 722720kB
[  580.612513] Total swap = 1048568kB
[  580.619688] 131072 pages RAM
[  580.622586] 9628 pages reserved
[  580.625726] 112220 pages shared
[  580.628868] 58034 pages non-shared

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10  8:32           ` KAMEZAWA Hiroyuki
@ 2009-06-10  9:30             ` Johannes Weiner
  -1 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-10  9:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Wed, Jun 10, 2009 at 05:32:49PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 10 Jun 2009 16:11:32 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> > > Hi Fengguang,
> > > 
> > > On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > > > [resend with lists cc'd, sorry]
> > > > > 
> > > > > [and fixed Hugh's email.  crap]
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > > > > array of that many swap entries on the stack, but I think it is better
> > > > > > to limit the cluster size to a sane maximum than using dynamic
> > > > > > allocation for this purpose.
> > > > 
> > > > Hi Johannes,
> > > > 
> > > > When stress testing your patch, I found it triggered many OOM kills.
> > > > Around the time of last OOMs, the memory usage is:
> > > > 
> > > >              total       used       free     shared    buffers     cached
> > > > Mem:           474        468          5          0          0        239
> > > > -/+ buffers/cache:        229        244
> > > > Swap:         1023        221        802
> > > 
> > > Wow, that really confused me for a second as we shouldn't read more
> > > pages ahead than without the patch, probably even less under stress.
> > 
> > Yup - swap readahead is much more challenging than sequential readahead,
> > in that it must be accurate enough given some really obscure patterns.
> > 
> > > So the problem has to be a runaway reading.  And indeed, severe
> > > stupidity here:
> > > 
> > > +       window = cluster << PAGE_SHIFT;
> > > +       min = addr & ~(window - 1);
> > > +       max = min + cluster;
> > > +       /*
> > > +        * To keep the locking/highpte mapping simple, stay
> > > +        * within the PTE range of one PMD entry.
> > > +        */
> > > +       limit = addr & PMD_MASK;
> > > +       if (limit > min)
> > > +               min = limit;
> > > +       limit = pmd_addr_end(addr, max);
> > > +       if (limit < max)
> > > +               max = limit;
> > > +       limit = max - min;
> > > 
> > > The mistake is at the initial calculation of max.  It should be
> > > 
> > > 	max = min + window;
> > > 
> > > The resulting problem is that min could get bigger than max when
> > > cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?
> > 
> > No I use the default 3.
> > 
> > btw, the mistake reflects bad named variables. How about rename
> >         cluster => pages
> >         window  => bytes
> > ?

Proven twice, fixed in v4.

> > > The initial min is aligned to a value below the PMD boundary and max
> > > based on it with a too small offset, staying below the PMD boundary as
> > > well.  When min is rounded up, this becomes a bit large:
> > > 
> > > 	limit = max - min;
> > > 
> > > So if my brain is already functioning, fixing the initial max should
> > > be enough because either
> > > 
> > > 	o window is smaller than PMD_SIZE, than we won't round down
> > > 	below a PMD boundary in the first place or
> > > 
> > > 	o window is bigger than PMD_SIZE, than we can round down below
> > > 	a PMD boundary but adding window to that is garuanteed to
> > > 	cross the boundary again
> > > 
> > > and thus max is always bigger than min.
> > > 
> > > Fengguang, does this make sense?  If so, the patch below should fix
> > > it.
> > 
> > Too bad, a quick test of the below patch freezes the box..
> > 
> 
> +	window = cluster << PAGE_SHIFT;
> +	min = addr & ~(window - 1);
> +	max = min + cluster;
> 
> max = min + window; # this is fixed. then,
> 
> +	/*
> +	 * To keep the locking/highpte mapping simple, stay
> +	 * within the PTE range of one PMD entry.
> +	 */
> +	limit = addr & PMD_MASK;
> +	if (limit > min)
> +		min = limit;
> +	limit = pmd_addr_end(addr, max);
> +	if (limit < max)
> +		max = limit;
> +	limit = max - min;
> 
> limit = (max - min) >> PAGE_SHIFT;

Head -> desk.

Fixed in v4, thank you.

	Hannes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10  9:30             ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-10  9:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Wed, Jun 10, 2009 at 05:32:49PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 10 Jun 2009 16:11:32 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> > > Hi Fengguang,
> > > 
> > > On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > > > [resend with lists cc'd, sorry]
> > > > > 
> > > > > [and fixed Hugh's email.  crap]
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > here is a new iteration of the virtual swap readahead.  Per Hugh's
> > > > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > > > ouf swap code.  Unfortunately, I had to bound page_cluster due to an
> > > > > > array of that many swap entries on the stack, but I think it is better
> > > > > > to limit the cluster size to a sane maximum than using dynamic
> > > > > > allocation for this purpose.
> > > > 
> > > > Hi Johannes,
> > > > 
> > > > When stress testing your patch, I found it triggered many OOM kills.
> > > > Around the time of last OOMs, the memory usage is:
> > > > 
> > > >              total       used       free     shared    buffers     cached
> > > > Mem:           474        468          5          0          0        239
> > > > -/+ buffers/cache:        229        244
> > > > Swap:         1023        221        802
> > > 
> > > Wow, that really confused me for a second as we shouldn't read more
> > > pages ahead than without the patch, probably even less under stress.
> > 
> > Yup - swap readahead is much more challenging than sequential readahead,
> > in that it must be accurate enough given some really obscure patterns.
> > 
> > > So the problem has to be a runaway reading.  And indeed, severe
> > > stupidity here:
> > > 
> > > +       window = cluster << PAGE_SHIFT;
> > > +       min = addr & ~(window - 1);
> > > +       max = min + cluster;
> > > +       /*
> > > +        * To keep the locking/highpte mapping simple, stay
> > > +        * within the PTE range of one PMD entry.
> > > +        */
> > > +       limit = addr & PMD_MASK;
> > > +       if (limit > min)
> > > +               min = limit;
> > > +       limit = pmd_addr_end(addr, max);
> > > +       if (limit < max)
> > > +               max = limit;
> > > +       limit = max - min;
> > > 
> > > The mistake is at the initial calculation of max.  It should be
> > > 
> > > 	max = min + window;
> > > 
> > > The resulting problem is that min could get bigger than max when
> > > cluster is bigger than PMD_SHIFT.  Did you use page_cluster == 5?
> > 
> > No I use the default 3.
> > 
> > btw, the mistake reflects bad named variables. How about rename
> >         cluster => pages
> >         window  => bytes
> > ?

Proven twice, fixed in v4.

> > > The initial min is aligned to a value below the PMD boundary and max
> > > based on it with a too small offset, staying below the PMD boundary as
> > > well.  When min is rounded up, this becomes a bit large:
> > > 
> > > 	limit = max - min;
> > > 
> > > So if my brain is already functioning, fixing the initial max should
> > > be enough because either
> > > 
> > > 	o window is smaller than PMD_SIZE, than we won't round down
> > > 	below a PMD boundary in the first place or
> > > 
> > > 	o window is bigger than PMD_SIZE, than we can round down below
> > > 	a PMD boundary but adding window to that is garuanteed to
> > > 	cross the boundary again
> > > 
> > > and thus max is always bigger than min.
> > > 
> > > Fengguang, does this make sense?  If so, the patch below should fix
> > > it.
> > 
> > Too bad, a quick test of the below patch freezes the box..
> > 
> 
> +	window = cluster << PAGE_SHIFT;
> +	min = addr & ~(window - 1);
> +	max = min + cluster;
> 
> max = min + window; # this is fixed. then,
> 
> +	/*
> +	 * To keep the locking/highpte mapping simple, stay
> +	 * within the PTE range of one PMD entry.
> +	 */
> +	limit = addr & PMD_MASK;
> +	if (limit > min)
> +		min = limit;
> +	limit = pmd_addr_end(addr, max);
> +	if (limit < max)
> +		max = limit;
> +	limit = max - min;
> 
> limit = (max - min) >> PAGE_SHIFT;

Head -> desk.

Fixed in v4, thank you.

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10  8:56             ` Wu Fengguang
@ 2009-06-10  9:42               ` Peter Zijlstra
  -1 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2009-06-10  9:42 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> 
> Yes it worked!  But then I run into page allocation failures:
> 
> [  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> [  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> [  340.651839] Call Trace:
> [  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> [  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> [  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> [  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]

Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
allocs.

But order-4 allocs failing isn't really strange, but it might indicate
this patch fragments stuff sooner, although I've seen these particular
failues before.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10  9:42               ` Peter Zijlstra
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2009-06-10  9:42 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> 
> Yes it worked!  But then I run into page allocation failures:
> 
> [  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> [  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> [  340.651839] Call Trace:
> [  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> [  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> [  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> [  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]

Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
allocs.

But order-4 allocs failing isn't really strange, but it might indicate
this patch fragments stuff sooner, although I've seen these particular
failues before.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10  9:42               ` Peter Zijlstra
@ 2009-06-10  9:59                 ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10  9:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm, linux-kernel,
	Barnes, Jesse

On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > 
> > Yes it worked!  But then I run into page allocation failures:
> > 
> > [  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> > [  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> > [  340.651839] Call Trace:
> > [  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> > [  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > [  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > [  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> 
> Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
> allocs.
> 
> But order-4 allocs failing isn't really strange, but it might indicate
> this patch fragments stuff sooner, although I've seen these particular
> failues before.

Thanks for the tip. Where is it? I'd like to try it out :)

Despite of the xorg failures, the test was able to complete with the
listed timing. The numbers are the time each program is able to start:

  before       after
    0.02        0.01    N xeyes
    0.76        0.68    N firefox
    1.88        1.89    N nautilus
    3.17        3.25    N nautilus --browser
    4.89        4.98    N gthumb
    6.47        6.79    N gedit
    8.16        8.56    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
   12.55       12.61    N xterm
   14.57       14.99    N mlterm
   17.06       17.16    N gnome-terminal
   18.90       19.60    N urxvt
   23.48       24.26    N gnome-system-monitor
   26.52       27.13    N gnome-help
   29.65       30.29    N gnome-dictionary
   36.12       36.93    N /usr/games/sol
   39.27       39.21    N /usr/games/gnometris
   42.56       43.61    N /usr/games/gnect
   47.03       47.40    N /usr/games/gtali
   52.05       51.41    N /usr/games/iagno
   55.42       56.21    N /usr/games/gnotravex
   61.47       60.58    N /usr/games/mahjongg
   67.11       64.68    N /usr/games/gnome-sudoku
   75.15       72.42    N /usr/games/glines
   79.70       78.61    N /usr/games/glchess
   88.48       87.01    N /usr/games/gnomine
   96.51       95.03    N /usr/games/gnotski
  102.19      100.50    N /usr/games/gnibbles
  114.93      108.97    N /usr/games/gnobots2
  125.02      120.09    N /usr/games/blackjack
  135.11      134.39    N /usr/games/same-gnome
  154.50      159.99    N /usr/bin/gnome-window-properties
  162.09      176.04    N /usr/bin/gnome-default-applications-properties
  173.29      197.12    N /usr/bin/gnome-at-properties
  188.21      221.15    N /usr/bin/gnome-typing-monitor
  199.93      249.38    N /usr/bin/gnome-at-visual
  206.95      272.87    N /usr/bin/gnome-sound-properties
  224.49      302.03    N /usr/bin/gnome-at-mobility
  234.11      325.73    N /usr/bin/gnome-keybinding-properties
  248.59      358.64    N /usr/bin/gnome-about-me
  276.27      402.30    N /usr/bin/gnome-display-properties
  304.39      439.35    N /usr/bin/gnome-network-preferences
  342.01      482.78    N /usr/bin/gnome-mouse-properties
  388.58      528.54    N /usr/bin/gnome-appearance-properties
  508.47      653.12    N /usr/bin/gnome-control-center
  587.57      769.65    N /usr/bin/gnome-keyboard-properties
  758.16     1021.65    N : oocalc
  830.03     1124.14    N : oodraw
  900.03     1246.52    N : ooimpress
  993.91     1370.35    N : oomath
 1081.89     1478.34    N : ooweb
 1161.99     1595.85    N : oowriter

It's slower with the patch. Maybe we shall give it another run with
the vmalloc patch.

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10  9:59                 ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10  9:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm, linux-kernel,
	Barnes, Jesse

On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > 
> > Yes it worked!  But then I run into page allocation failures:
> > 
> > [  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> > [  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> > [  340.651839] Call Trace:
> > [  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> > [  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > [  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > [  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> 
> Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
> allocs.
> 
> But order-4 allocs failing isn't really strange, but it might indicate
> this patch fragments stuff sooner, although I've seen these particular
> failues before.

Thanks for the tip. Where is it? I'd like to try it out :)

Despite of the xorg failures, the test was able to complete with the
listed timing. The numbers are the time each program is able to start:

  before       after
    0.02        0.01    N xeyes
    0.76        0.68    N firefox
    1.88        1.89    N nautilus
    3.17        3.25    N nautilus --browser
    4.89        4.98    N gthumb
    6.47        6.79    N gedit
    8.16        8.56    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
   12.55       12.61    N xterm
   14.57       14.99    N mlterm
   17.06       17.16    N gnome-terminal
   18.90       19.60    N urxvt
   23.48       24.26    N gnome-system-monitor
   26.52       27.13    N gnome-help
   29.65       30.29    N gnome-dictionary
   36.12       36.93    N /usr/games/sol
   39.27       39.21    N /usr/games/gnometris
   42.56       43.61    N /usr/games/gnect
   47.03       47.40    N /usr/games/gtali
   52.05       51.41    N /usr/games/iagno
   55.42       56.21    N /usr/games/gnotravex
   61.47       60.58    N /usr/games/mahjongg
   67.11       64.68    N /usr/games/gnome-sudoku
   75.15       72.42    N /usr/games/glines
   79.70       78.61    N /usr/games/glchess
   88.48       87.01    N /usr/games/gnomine
   96.51       95.03    N /usr/games/gnotski
  102.19      100.50    N /usr/games/gnibbles
  114.93      108.97    N /usr/games/gnobots2
  125.02      120.09    N /usr/games/blackjack
  135.11      134.39    N /usr/games/same-gnome
  154.50      159.99    N /usr/bin/gnome-window-properties
  162.09      176.04    N /usr/bin/gnome-default-applications-properties
  173.29      197.12    N /usr/bin/gnome-at-properties
  188.21      221.15    N /usr/bin/gnome-typing-monitor
  199.93      249.38    N /usr/bin/gnome-at-visual
  206.95      272.87    N /usr/bin/gnome-sound-properties
  224.49      302.03    N /usr/bin/gnome-at-mobility
  234.11      325.73    N /usr/bin/gnome-keybinding-properties
  248.59      358.64    N /usr/bin/gnome-about-me
  276.27      402.30    N /usr/bin/gnome-display-properties
  304.39      439.35    N /usr/bin/gnome-network-preferences
  342.01      482.78    N /usr/bin/gnome-mouse-properties
  388.58      528.54    N /usr/bin/gnome-appearance-properties
  508.47      653.12    N /usr/bin/gnome-control-center
  587.57      769.65    N /usr/bin/gnome-keyboard-properties
  758.16     1021.65    N : oocalc
  830.03     1124.14    N : oodraw
  900.03     1246.52    N : ooimpress
  993.91     1370.35    N : oomath
 1081.89     1478.34    N : ooweb
 1161.99     1595.85    N : oowriter

It's slower with the patch. Maybe we shall give it another run with
the vmalloc patch.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10  9:59                 ` Wu Fengguang
@ 2009-06-10 10:05                   ` Peter Zijlstra
  -1 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2009-06-10 10:05 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm, linux-kernel,
	Barnes, Jesse

On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > 
> > > Yes it worked!  But then I run into page allocation failures:
> > > 
> > > [  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> > > [  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> > > [  340.651839] Call Trace:
> > > [  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> > > [  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > [  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > [  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > 
> > Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
> > allocs.
> > 
> > But order-4 allocs failing isn't really strange, but it might indicate
> > this patch fragments stuff sooner, although I've seen these particular
> > failues before.
> 
> Thanks for the tip. Where is it? I'd like to try it out :)

commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri May 8 16:13:25 2009 -0700

    drm/i915: allocate large pointer arrays with vmalloc



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10 10:05                   ` Peter Zijlstra
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2009-06-10 10:05 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm, linux-kernel,
	Barnes, Jesse

On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > 
> > > Yes it worked!  But then I run into page allocation failures:
> > > 
> > > [  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> > > [  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> > > [  340.651839] Call Trace:
> > > [  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> > > [  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > [  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > [  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > 
> > Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
> > allocs.
> > 
> > But order-4 allocs failing isn't really strange, but it might indicate
> > this patch fragments stuff sooner, although I've seen these particular
> > failues before.
> 
> Thanks for the tip. Where is it? I'd like to try it out :)

commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri May 8 16:13:25 2009 -0700

    drm/i915: allocate large pointer arrays with vmalloc


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10 10:05                   ` Peter Zijlstra
@ 2009-06-10 11:32                     ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10 11:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm, linux-kernel,
	Barnes, Jesse

On Wed, Jun 10, 2009 at 06:05:14PM +0800, Peter Zijlstra wrote:
> On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> > On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > > 
> > > > Yes it worked!  But then I run into page allocation failures:
> > > > 
> > > > [  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> > > > [  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> > > > [  340.651839] Call Trace:
> > > > [  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> > > > [  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > > [  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > > [  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > 
> > > Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
> > > allocs.
> > > 
> > > But order-4 allocs failing isn't really strange, but it might indicate
> > > this patch fragments stuff sooner, although I've seen these particular
> > > failues before.
> > 
> > Thanks for the tip. Where is it? I'd like to try it out :)
> 
> commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
> Author: Jesse Barnes <jbarnes@virtuousgeek.org>
> Date:   Fri May 8 16:13:25 2009 -0700
> 
>     drm/i915: allocate large pointer arrays with vmalloc

Thanks! It is already in the -mm tree, but it missed on conversion :)

I'll retry with this patch tomorrow.

Thanks,
Fengguang
---

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 39f5c65..7132dbe 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3230,8 +3230,8 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	if (args->num_cliprects != 0) {
-		cliprects = drm_calloc(args->num_cliprects, sizeof(*cliprects),
-				       DRM_MEM_DRIVER);
+		cliprects = drm_calloc_large(args->num_cliprects,
+					     sizeof(*cliprects));
 		if (cliprects == NULL)
 			goto pre_mutex_err;
 
@@ -3474,8 +3474,7 @@ err:
 pre_mutex_err:
 	drm_free_large(object_list);
 	drm_free_large(exec_list);
-	drm_free(cliprects, sizeof(*cliprects) * args->num_cliprects,
-		 DRM_MEM_DRIVER);
+	drm_free_large(cliprects);
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10 11:32                     ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-10 11:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm, linux-kernel,
	Barnes, Jesse

On Wed, Jun 10, 2009 at 06:05:14PM +0800, Peter Zijlstra wrote:
> On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> > On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > > 
> > > > Yes it worked!  But then I run into page allocation failures:
> > > > 
> > > > [  340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> > > > [  340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> > > > [  340.651839] Call Trace:
> > > > [  340.654289]  [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> > > > [  340.660645]  [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > > [  340.671786]  [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > > [  340.678746]  [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > 
> > > Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
> > > allocs.
> > > 
> > > But order-4 allocs failing isn't really strange, but it might indicate
> > > this patch fragments stuff sooner, although I've seen these particular
> > > failues before.
> > 
> > Thanks for the tip. Where is it? I'd like to try it out :)
> 
> commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
> Author: Jesse Barnes <jbarnes@virtuousgeek.org>
> Date:   Fri May 8 16:13:25 2009 -0700
> 
>     drm/i915: allocate large pointer arrays with vmalloc

Thanks! It is already in the -mm tree, but it missed on conversion :)

I'll retry with this patch tomorrow.

Thanks,
Fengguang
---

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 39f5c65..7132dbe 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3230,8 +3230,8 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	if (args->num_cliprects != 0) {
-		cliprects = drm_calloc(args->num_cliprects, sizeof(*cliprects),
-				       DRM_MEM_DRIVER);
+		cliprects = drm_calloc_large(args->num_cliprects,
+					     sizeof(*cliprects));
 		if (cliprects == NULL)
 			goto pre_mutex_err;
 
@@ -3474,8 +3474,7 @@ err:
 pre_mutex_err:
 	drm_free_large(object_list);
 	drm_free_large(exec_list);
-	drm_free(cliprects, sizeof(*cliprects) * args->num_cliprects,
-		 DRM_MEM_DRIVER);
+	drm_free_large(cliprects);
 
 	return ret;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10 11:32                     ` Wu Fengguang
@ 2009-06-10 17:25                       ` Jesse Barnes
  -1 siblings, 0 replies; 55+ messages in thread
From: Jesse Barnes @ 2009-06-10 17:25 UTC (permalink / raw)
  To: Wu, Fengguang
  Cc: Peter Zijlstra, KAMEZAWA Hiroyuki, Johannes Weiner,
	Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Minchan Kim, linux-mm, linux-kernel

On Wed, 10 Jun 2009 04:32:14 -0700
"Wu, Fengguang" <fengguang.wu@intel.com> wrote:

> On Wed, Jun 10, 2009 at 06:05:14PM +0800, Peter Zijlstra wrote:
> > On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> > > On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > > > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > > > 
> > > > > Yes it worked!  But then I run into page allocation failures:
> > > > > 
> > > > > [  340.639803] Xorg: page allocation failure. order:4,
> > > > > mode:0x40d0 [  340.645744] Pid: 3258, comm: Xorg Not tainted
> > > > > 2.6.30-rc8-mm1 #303 [  340.651839] Call Trace:
> > > > > [  340.654289]  [<ffffffff810c8204>]
> > > > > __alloc_pages_nodemask+0x344/0x6c0 [  340.660645]
> > > > > [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > > > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > > > [  340.671786]  [<ffffffffa014bf9f>] ?
> > > > > i915_gem_execbuffer+0x17f/0x11e0 [i915] [  340.678746]
> > > > > [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > > 
> > > > Jesse Barnes had a patch to add a vmalloc fallback to those
> > > > largish kms allocs.
> > > > 
> > > > But order-4 allocs failing isn't really strange, but it might
> > > > indicate this patch fragments stuff sooner, although I've seen
> > > > these particular failues before.
> > > 
> > > Thanks for the tip. Where is it? I'd like to try it out :)
> > 
> > commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
> > Author: Jesse Barnes <jbarnes@virtuousgeek.org>
> > Date:   Fri May 8 16:13:25 2009 -0700
> > 
> >     drm/i915: allocate large pointer arrays with vmalloc
> 
> Thanks! It is already in the -mm tree, but it missed on conversion :)
> 
> I'll retry with this patch tomorrow.
> 
> Thanks,
> Fengguang
> ---
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c index 39f5c65..7132dbe 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3230,8 +3230,8 @@ i915_gem_execbuffer(struct drm_device *dev,
> void *data, }
>  
>  	if (args->num_cliprects != 0) {
> -		cliprects = drm_calloc(args->num_cliprects,
> sizeof(*cliprects),
> -				       DRM_MEM_DRIVER);
> +		cliprects = drm_calloc_large(args->num_cliprects,
> +					     sizeof(*cliprects));
>  		if (cliprects == NULL)
>  			goto pre_mutex_err;
>  
> @@ -3474,8 +3474,7 @@ err:
>  pre_mutex_err:
>  	drm_free_large(object_list);
>  	drm_free_large(exec_list);
> -	drm_free(cliprects, sizeof(*cliprects) * args->num_cliprects,
> -		 DRM_MEM_DRIVER);
> +	drm_free_large(cliprects);
>  
>  	return ret;
>  }

Kristian posted a fix to my drm_calloc_large function as well; one of
the size checks in drm_calloc_large (the one which decides whether to
use kmalloc or vmalloc) was just checking size instead of size * num,
so you may be hitting that.

Jesse

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-10 17:25                       ` Jesse Barnes
  0 siblings, 0 replies; 55+ messages in thread
From: Jesse Barnes @ 2009-06-10 17:25 UTC (permalink / raw)
  To: Wu, Fengguang
  Cc: Peter Zijlstra, KAMEZAWA Hiroyuki, Johannes Weiner,
	Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Minchan Kim, linux-mm, linux-kernel

On Wed, 10 Jun 2009 04:32:14 -0700
"Wu, Fengguang" <fengguang.wu@intel.com> wrote:

> On Wed, Jun 10, 2009 at 06:05:14PM +0800, Peter Zijlstra wrote:
> > On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> > > On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > > > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > > > 
> > > > > Yes it worked!  But then I run into page allocation failures:
> > > > > 
> > > > > [  340.639803] Xorg: page allocation failure. order:4,
> > > > > mode:0x40d0 [  340.645744] Pid: 3258, comm: Xorg Not tainted
> > > > > 2.6.30-rc8-mm1 #303 [  340.651839] Call Trace:
> > > > > [  340.654289]  [<ffffffff810c8204>]
> > > > > __alloc_pages_nodemask+0x344/0x6c0 [  340.660645]
> > > > > [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > > > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > > > [  340.671786]  [<ffffffffa014bf9f>] ?
> > > > > i915_gem_execbuffer+0x17f/0x11e0 [i915] [  340.678746]
> > > > > [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > > 
> > > > Jesse Barnes had a patch to add a vmalloc fallback to those
> > > > largish kms allocs.
> > > > 
> > > > But order-4 allocs failing isn't really strange, but it might
> > > > indicate this patch fragments stuff sooner, although I've seen
> > > > these particular failues before.
> > > 
> > > Thanks for the tip. Where is it? I'd like to try it out :)
> > 
> > commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
> > Author: Jesse Barnes <jbarnes@virtuousgeek.org>
> > Date:   Fri May 8 16:13:25 2009 -0700
> > 
> >     drm/i915: allocate large pointer arrays with vmalloc
> 
> Thanks! It is already in the -mm tree, but it missed on conversion :)
> 
> I'll retry with this patch tomorrow.
> 
> Thanks,
> Fengguang
> ---
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c index 39f5c65..7132dbe 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3230,8 +3230,8 @@ i915_gem_execbuffer(struct drm_device *dev,
> void *data, }
>  
>  	if (args->num_cliprects != 0) {
> -		cliprects = drm_calloc(args->num_cliprects,
> sizeof(*cliprects),
> -				       DRM_MEM_DRIVER);
> +		cliprects = drm_calloc_large(args->num_cliprects,
> +					     sizeof(*cliprects));
>  		if (cliprects == NULL)
>  			goto pre_mutex_err;
>  
> @@ -3474,8 +3474,7 @@ err:
>  pre_mutex_err:
>  	drm_free_large(object_list);
>  	drm_free_large(exec_list);
> -	drm_free(cliprects, sizeof(*cliprects) * args->num_cliprects,
> -		 DRM_MEM_DRIVER);
> +	drm_free_large(cliprects);
>  
>  	return ret;
>  }

Kristian posted a fix to my drm_calloc_large function as well; one of
the size checks in drm_calloc_large (the one which decides whether to
use kmalloc or vmalloc) was just checking size instead of size * num,
so you may be hitting that.

Jesse

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-10 17:25                       ` Jesse Barnes
@ 2009-06-11  5:22                         ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-11  5:22 UTC (permalink / raw)
  To: Barnes, Jesse
  Cc: Peter Zijlstra, KAMEZAWA Hiroyuki, Johannes Weiner,
	Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Minchan Kim, linux-mm, linux-kernel

On Thu, Jun 11, 2009 at 01:25:16AM +0800, Barnes, Jesse wrote:
> On Wed, 10 Jun 2009 04:32:14 -0700
> "Wu, Fengguang" <fengguang.wu@intel.com> wrote:
> 
> > On Wed, Jun 10, 2009 at 06:05:14PM +0800, Peter Zijlstra wrote:
> > > On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > > > > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > > > > 
> > > > > > Yes it worked!  But then I run into page allocation failures:
> > > > > > 
> > > > > > [  340.639803] Xorg: page allocation failure. order:4,
> > > > > > mode:0x40d0 [  340.645744] Pid: 3258, comm: Xorg Not tainted
> > > > > > 2.6.30-rc8-mm1 #303 [  340.651839] Call Trace:
> > > > > > [  340.654289]  [<ffffffff810c8204>]
> > > > > > __alloc_pages_nodemask+0x344/0x6c0 [  340.660645]
> > > > > > [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > > > > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > > > > [  340.671786]  [<ffffffffa014bf9f>] ?
> > > > > > i915_gem_execbuffer+0x17f/0x11e0 [i915] [  340.678746]
> > > > > > [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > > > 
> > > > > Jesse Barnes had a patch to add a vmalloc fallback to those
> > > > > largish kms allocs.
> > > > > 
> > > > > But order-4 allocs failing isn't really strange, but it might
> > > > > indicate this patch fragments stuff sooner, although I've seen
> > > > > these particular failues before.
> > > > 
> > > > Thanks for the tip. Where is it? I'd like to try it out :)
> > > 
> > > commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
> > > Author: Jesse Barnes <jbarnes@virtuousgeek.org>
> > > Date:   Fri May 8 16:13:25 2009 -0700
> > > 
> > >     drm/i915: allocate large pointer arrays with vmalloc
> > 
> > Thanks! It is already in the -mm tree, but it missed on conversion :)
> > 
> > I'll retry with this patch tomorrow.
> > 
> > Thanks,
> > Fengguang
> > ---
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c index 39f5c65..7132dbe 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3230,8 +3230,8 @@ i915_gem_execbuffer(struct drm_device *dev,
> > void *data, }
> >  
> >  	if (args->num_cliprects != 0) {
> > -		cliprects = drm_calloc(args->num_cliprects,
> > sizeof(*cliprects),
> > -				       DRM_MEM_DRIVER);
> > +		cliprects = drm_calloc_large(args->num_cliprects,
> > +					     sizeof(*cliprects));
> >  		if (cliprects == NULL)
> >  			goto pre_mutex_err;
> >  
> > @@ -3474,8 +3474,7 @@ err:
> >  pre_mutex_err:
> >  	drm_free_large(object_list);
> >  	drm_free_large(exec_list);
> > -	drm_free(cliprects, sizeof(*cliprects) * args->num_cliprects,
> > -		 DRM_MEM_DRIVER);
> > +	drm_free_large(cliprects);
> >  
> >  	return ret;
> >  }
> 
> Kristian posted a fix to my drm_calloc_large function as well; one of
> the size checks in drm_calloc_large (the one which decides whether to
> use kmalloc or vmalloc) was just checking size instead of size * num,
> so you may be hitting that.

Yes, it is.

Unfortunately, after fixing it up the swap readahead patch still performs slow
(even worse this time):

  before       after
    0.02        0.01    N xeyes
    0.76        0.89    N firefox
    1.88        2.21    N nautilus
    3.17        3.41    N nautilus --browser
    4.89        5.20    N gthumb
    6.47        7.02    N gedit
    8.16        8.90    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
   12.55       13.36    N xterm
   14.57       15.57    N mlterm
   17.06       18.11    N gnome-terminal
   18.90       20.37    N urxvt
   23.48       25.26    N gnome-system-monitor
   26.52       27.84    N gnome-help
   29.65       31.93    N gnome-dictionary
   36.12       37.74    N /usr/games/sol
   39.27       40.61    N /usr/games/gnometris
   42.56       43.75    N /usr/games/gnect
   47.03       47.85    N /usr/games/gtali
   52.05       52.31    N /usr/games/iagno
   55.42       55.61    N /usr/games/gnotravex
   61.47       61.38    N /usr/games/mahjongg
   67.11       65.07    N /usr/games/gnome-sudoku
   75.15       70.36    N /usr/games/glines
   79.70       74.96    N /usr/games/glchess
   88.48       80.82    N /usr/games/gnomine
   96.51       88.30    N /usr/games/gnotski
  102.19       94.26    N /usr/games/gnibbles
  114.93      102.02    N /usr/games/gnobots2
  125.02      115.23    N /usr/games/blackjack
  135.11      128.41    N /usr/games/same-gnome
  154.50      153.05    N /usr/bin/gnome-window-properties
  162.09      169.53    N /usr/bin/gnome-default-applications-properties
  173.29      190.32    N /usr/bin/gnome-at-properties
  188.21      212.70    N /usr/bin/gnome-typing-monitor
  199.93      236.18    N /usr/bin/gnome-at-visual
  206.95      261.88    N /usr/bin/gnome-sound-properties
  224.49      304.66    N /usr/bin/gnome-at-mobility
  234.11      336.73    N /usr/bin/gnome-keybinding-properties
  248.59      374.03    N /usr/bin/gnome-about-me
  276.27      433.86    N /usr/bin/gnome-display-properties
  304.39      488.43    N /usr/bin/gnome-network-preferences
  342.01      686.68    N /usr/bin/gnome-mouse-properties
  388.58      769.21    N /usr/bin/gnome-appearance-properties
  508.47      933.35    N /usr/bin/gnome-control-center
  587.57     1193.27    N /usr/bin/gnome-keyboard-properties
 [...]

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-11  5:22                         ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-11  5:22 UTC (permalink / raw)
  To: Barnes, Jesse
  Cc: Peter Zijlstra, KAMEZAWA Hiroyuki, Johannes Weiner,
	Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Minchan Kim, linux-mm, linux-kernel

On Thu, Jun 11, 2009 at 01:25:16AM +0800, Barnes, Jesse wrote:
> On Wed, 10 Jun 2009 04:32:14 -0700
> "Wu, Fengguang" <fengguang.wu@intel.com> wrote:
> 
> > On Wed, Jun 10, 2009 at 06:05:14PM +0800, Peter Zijlstra wrote:
> > > On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > > > > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > > > > 
> > > > > > Yes it worked!  But then I run into page allocation failures:
> > > > > > 
> > > > > > [  340.639803] Xorg: page allocation failure. order:4,
> > > > > > mode:0x40d0 [  340.645744] Pid: 3258, comm: Xorg Not tainted
> > > > > > 2.6.30-rc8-mm1 #303 [  340.651839] Call Trace:
> > > > > > [  340.654289]  [<ffffffff810c8204>]
> > > > > > __alloc_pages_nodemask+0x344/0x6c0 [  340.660645]
> > > > > > [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > > > > [  340.666472]  [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > > > > [  340.671786]  [<ffffffffa014bf9f>] ?
> > > > > > i915_gem_execbuffer+0x17f/0x11e0 [i915] [  340.678746]
> > > > > > [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > > > 
> > > > > Jesse Barnes had a patch to add a vmalloc fallback to those
> > > > > largish kms allocs.
> > > > > 
> > > > > But order-4 allocs failing isn't really strange, but it might
> > > > > indicate this patch fragments stuff sooner, although I've seen
> > > > > these particular failues before.
> > > > 
> > > > Thanks for the tip. Where is it? I'd like to try it out :)
> > > 
> > > commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
> > > Author: Jesse Barnes <jbarnes@virtuousgeek.org>
> > > Date:   Fri May 8 16:13:25 2009 -0700
> > > 
> > >     drm/i915: allocate large pointer arrays with vmalloc
> > 
> > Thanks! It is already in the -mm tree, but it missed on conversion :)
> > 
> > I'll retry with this patch tomorrow.
> > 
> > Thanks,
> > Fengguang
> > ---
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c index 39f5c65..7132dbe 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3230,8 +3230,8 @@ i915_gem_execbuffer(struct drm_device *dev,
> > void *data, }
> >  
> >  	if (args->num_cliprects != 0) {
> > -		cliprects = drm_calloc(args->num_cliprects,
> > sizeof(*cliprects),
> > -				       DRM_MEM_DRIVER);
> > +		cliprects = drm_calloc_large(args->num_cliprects,
> > +					     sizeof(*cliprects));
> >  		if (cliprects == NULL)
> >  			goto pre_mutex_err;
> >  
> > @@ -3474,8 +3474,7 @@ err:
> >  pre_mutex_err:
> >  	drm_free_large(object_list);
> >  	drm_free_large(exec_list);
> > -	drm_free(cliprects, sizeof(*cliprects) * args->num_cliprects,
> > -		 DRM_MEM_DRIVER);
> > +	drm_free_large(cliprects);
> >  
> >  	return ret;
> >  }
> 
> Kristian posted a fix to my drm_calloc_large function as well; one of
> the size checks in drm_calloc_large (the one which decides whether to
> use kmalloc or vmalloc) was just checking size instead of size * num,
> so you may be hitting that.

Yes, it is.

Unfortunately, after fixing it up the swap readahead patch still performs slow
(even worse this time):

  before       after
    0.02        0.01    N xeyes
    0.76        0.89    N firefox
    1.88        2.21    N nautilus
    3.17        3.41    N nautilus --browser
    4.89        5.20    N gthumb
    6.47        7.02    N gedit
    8.16        8.90    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
   12.55       13.36    N xterm
   14.57       15.57    N mlterm
   17.06       18.11    N gnome-terminal
   18.90       20.37    N urxvt
   23.48       25.26    N gnome-system-monitor
   26.52       27.84    N gnome-help
   29.65       31.93    N gnome-dictionary
   36.12       37.74    N /usr/games/sol
   39.27       40.61    N /usr/games/gnometris
   42.56       43.75    N /usr/games/gnect
   47.03       47.85    N /usr/games/gtali
   52.05       52.31    N /usr/games/iagno
   55.42       55.61    N /usr/games/gnotravex
   61.47       61.38    N /usr/games/mahjongg
   67.11       65.07    N /usr/games/gnome-sudoku
   75.15       70.36    N /usr/games/glines
   79.70       74.96    N /usr/games/glchess
   88.48       80.82    N /usr/games/gnomine
   96.51       88.30    N /usr/games/gnotski
  102.19       94.26    N /usr/games/gnibbles
  114.93      102.02    N /usr/games/gnobots2
  125.02      115.23    N /usr/games/blackjack
  135.11      128.41    N /usr/games/same-gnome
  154.50      153.05    N /usr/bin/gnome-window-properties
  162.09      169.53    N /usr/bin/gnome-default-applications-properties
  173.29      190.32    N /usr/bin/gnome-at-properties
  188.21      212.70    N /usr/bin/gnome-typing-monitor
  199.93      236.18    N /usr/bin/gnome-at-visual
  206.95      261.88    N /usr/bin/gnome-sound-properties
  224.49      304.66    N /usr/bin/gnome-at-mobility
  234.11      336.73    N /usr/bin/gnome-keybinding-properties
  248.59      374.03    N /usr/bin/gnome-about-me
  276.27      433.86    N /usr/bin/gnome-display-properties
  304.39      488.43    N /usr/bin/gnome-network-preferences
  342.01      686.68    N /usr/bin/gnome-mouse-properties
  388.58      769.21    N /usr/bin/gnome-appearance-properties
  508.47      933.35    N /usr/bin/gnome-control-center
  587.57     1193.27    N /usr/bin/gnome-keyboard-properties
 [...]

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-09 19:01 ` Johannes Weiner
@ 2009-06-11  5:31   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 55+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-11  5:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Tue, 9 Jun 2009 21:01:28 +0200
Johannes Weiner <hannes@cmpxchg.org> wrote:
> [resend with lists cc'd, sorry]
> 
> +static int swap_readahead_ptes(struct mm_struct *mm,
> +			unsigned long addr, pmd_t *pmd,
> +			swp_entry_t *entries,
> +			unsigned long cluster)
> +{
> +	unsigned long window, min, max, limit;
> +	spinlock_t *ptl;
> +	pte_t *ptep;
> +	int i, nr;
> +
> +	window = cluster << PAGE_SHIFT;
> +	min = addr & ~(window - 1);
> +	max = min + cluster;

Johannes, I wonder there is no reason to use "alignment".
I think we just need to read "nearby" pages. Then, this function's
scan range should be

	[addr - window/2, addr + window/2)
or some.

And here, too
> +	if (!entries)	/* XXX: shmem case */
> +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> +	pmin = swp_offset(entry) & ~(cluster - 1);
> +	pmax = pmin + cluster;

pmin = swp_offset(entry) - cluster/2.
pmax = swp_offset(entry) + cluster/2.

I'm sorry if I miss a reason for using "alignment".

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-11  5:31   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 55+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-11  5:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Tue, 9 Jun 2009 21:01:28 +0200
Johannes Weiner <hannes@cmpxchg.org> wrote:
> [resend with lists cc'd, sorry]
> 
> +static int swap_readahead_ptes(struct mm_struct *mm,
> +			unsigned long addr, pmd_t *pmd,
> +			swp_entry_t *entries,
> +			unsigned long cluster)
> +{
> +	unsigned long window, min, max, limit;
> +	spinlock_t *ptl;
> +	pte_t *ptep;
> +	int i, nr;
> +
> +	window = cluster << PAGE_SHIFT;
> +	min = addr & ~(window - 1);
> +	max = min + cluster;

Johannes, I wonder there is no reason to use "alignment".
I think we just need to read "nearby" pages. Then, this function's
scan range should be

	[addr - window/2, addr + window/2)
or some.

And here, too
> +	if (!entries)	/* XXX: shmem case */
> +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> +	pmin = swp_offset(entry) & ~(cluster - 1);
> +	pmax = pmin + cluster;

pmin = swp_offset(entry) - cluster/2.
pmax = swp_offset(entry) + cluster/2.

I'm sorry if I miss a reason for using "alignment".

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-11  5:22                         ` Wu Fengguang
@ 2009-06-11 10:17                           ` Johannes Weiner
  -1 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-11 10:17 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> Unfortunately, after fixing it up the swap readahead patch still performs slow
> (even worse this time):

Thanks for doing the tests.  Do you know if the time difference comes
from IO or CPU time?

Because one reason I could think of is that the original code walks
the readaround window in two directions, starting from the target each
time but immediately stops when it encounters a hole where the new
code just skips holes but doesn't abort readaround and thus might
indeed read more slots.

I have an old patch flying around that changed the physical ra code to
use a bitmap that is able to represent holes.  If the increased time
is waiting for IO, I would be interested if that patch has the same
negative impact.

	Hannes

>   before       after
>     0.02        0.01    N xeyes
>     0.76        0.89    N firefox
>     1.88        2.21    N nautilus
>     3.17        3.41    N nautilus --browser
>     4.89        5.20    N gthumb
>     6.47        7.02    N gedit
>     8.16        8.90    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
>    12.55       13.36    N xterm
>    14.57       15.57    N mlterm
>    17.06       18.11    N gnome-terminal
>    18.90       20.37    N urxvt
>    23.48       25.26    N gnome-system-monitor
>    26.52       27.84    N gnome-help
>    29.65       31.93    N gnome-dictionary
>    36.12       37.74    N /usr/games/sol
>    39.27       40.61    N /usr/games/gnometris
>    42.56       43.75    N /usr/games/gnect
>    47.03       47.85    N /usr/games/gtali
>    52.05       52.31    N /usr/games/iagno
>    55.42       55.61    N /usr/games/gnotravex
>    61.47       61.38    N /usr/games/mahjongg
>    67.11       65.07    N /usr/games/gnome-sudoku
>    75.15       70.36    N /usr/games/glines
>    79.70       74.96    N /usr/games/glchess
>    88.48       80.82    N /usr/games/gnomine
>    96.51       88.30    N /usr/games/gnotski
>   102.19       94.26    N /usr/games/gnibbles
>   114.93      102.02    N /usr/games/gnobots2
>   125.02      115.23    N /usr/games/blackjack
>   135.11      128.41    N /usr/games/same-gnome
>   154.50      153.05    N /usr/bin/gnome-window-properties
>   162.09      169.53    N /usr/bin/gnome-default-applications-properties
>   173.29      190.32    N /usr/bin/gnome-at-properties
>   188.21      212.70    N /usr/bin/gnome-typing-monitor
>   199.93      236.18    N /usr/bin/gnome-at-visual
>   206.95      261.88    N /usr/bin/gnome-sound-properties
>   224.49      304.66    N /usr/bin/gnome-at-mobility
>   234.11      336.73    N /usr/bin/gnome-keybinding-properties
>   248.59      374.03    N /usr/bin/gnome-about-me
>   276.27      433.86    N /usr/bin/gnome-display-properties
>   304.39      488.43    N /usr/bin/gnome-network-preferences
>   342.01      686.68    N /usr/bin/gnome-mouse-properties
>   388.58      769.21    N /usr/bin/gnome-appearance-properties
>   508.47      933.35    N /usr/bin/gnome-control-center
>   587.57     1193.27    N /usr/bin/gnome-keyboard-properties

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-11 10:17                           ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-11 10:17 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> Unfortunately, after fixing it up the swap readahead patch still performs slow
> (even worse this time):

Thanks for doing the tests.  Do you know if the time difference comes
from IO or CPU time?

Because one reason I could think of is that the original code walks
the readaround window in two directions, starting from the target each
time but immediately stops when it encounters a hole where the new
code just skips holes but doesn't abort readaround and thus might
indeed read more slots.

I have an old patch flying around that changed the physical ra code to
use a bitmap that is able to represent holes.  If the increased time
is waiting for IO, I would be interested if that patch has the same
negative impact.

	Hannes

>   before       after
>     0.02        0.01    N xeyes
>     0.76        0.89    N firefox
>     1.88        2.21    N nautilus
>     3.17        3.41    N nautilus --browser
>     4.89        5.20    N gthumb
>     6.47        7.02    N gedit
>     8.16        8.90    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
>    12.55       13.36    N xterm
>    14.57       15.57    N mlterm
>    17.06       18.11    N gnome-terminal
>    18.90       20.37    N urxvt
>    23.48       25.26    N gnome-system-monitor
>    26.52       27.84    N gnome-help
>    29.65       31.93    N gnome-dictionary
>    36.12       37.74    N /usr/games/sol
>    39.27       40.61    N /usr/games/gnometris
>    42.56       43.75    N /usr/games/gnect
>    47.03       47.85    N /usr/games/gtali
>    52.05       52.31    N /usr/games/iagno
>    55.42       55.61    N /usr/games/gnotravex
>    61.47       61.38    N /usr/games/mahjongg
>    67.11       65.07    N /usr/games/gnome-sudoku
>    75.15       70.36    N /usr/games/glines
>    79.70       74.96    N /usr/games/glchess
>    88.48       80.82    N /usr/games/gnomine
>    96.51       88.30    N /usr/games/gnotski
>   102.19       94.26    N /usr/games/gnibbles
>   114.93      102.02    N /usr/games/gnobots2
>   125.02      115.23    N /usr/games/blackjack
>   135.11      128.41    N /usr/games/same-gnome
>   154.50      153.05    N /usr/bin/gnome-window-properties
>   162.09      169.53    N /usr/bin/gnome-default-applications-properties
>   173.29      190.32    N /usr/bin/gnome-at-properties
>   188.21      212.70    N /usr/bin/gnome-typing-monitor
>   199.93      236.18    N /usr/bin/gnome-at-visual
>   206.95      261.88    N /usr/bin/gnome-sound-properties
>   224.49      304.66    N /usr/bin/gnome-at-mobility
>   234.11      336.73    N /usr/bin/gnome-keybinding-properties
>   248.59      374.03    N /usr/bin/gnome-about-me
>   276.27      433.86    N /usr/bin/gnome-display-properties
>   304.39      488.43    N /usr/bin/gnome-network-preferences
>   342.01      686.68    N /usr/bin/gnome-mouse-properties
>   388.58      769.21    N /usr/bin/gnome-appearance-properties
>   508.47      933.35    N /usr/bin/gnome-control-center
>   587.57     1193.27    N /usr/bin/gnome-keyboard-properties

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-11 10:17                           ` Johannes Weiner
@ 2009-06-12  1:59                             ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-12  1:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > (even worse this time):
> 
> Thanks for doing the tests.  Do you know if the time difference comes
> from IO or CPU time?
> 
> Because one reason I could think of is that the original code walks
> the readaround window in two directions, starting from the target each
> time but immediately stops when it encounters a hole where the new
> code just skips holes but doesn't abort readaround and thus might
> indeed read more slots.
> 
> I have an old patch flying around that changed the physical ra code to
> use a bitmap that is able to represent holes.  If the increased time
> is waiting for IO, I would be interested if that patch has the same
> negative impact.

You can send me the patch :)

But for this patch it is IO bound. The CPU iowait field actually is
going up as the test goes on:

wfg@hp ~% dstat 10
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  3   3  89   4   0   1|  18k   27B|   0     0 |   0     0 |1530  1006
  0   1  99   0   0   0|   0     0 |  31k 9609B|   0     0 |1071   444
  1   1  97   1   0   0|   0     0 |  57k   13k|   0     0 |1139   870
 30  31  24  13   0   3|   0   741k|1648k  294k|   0   370k|3666    10k
 27  30  26  14   0   3| 361k 3227k|1264k  262k| 180k 1614k|3471  9457
 25  25  29  18   0   2| 479k 4102k|2353k  285k| 240k 2051k|3707  9429
 39  44   5   8   0   4| 256k 7646k|2711k  564k| 128k 3823k|7055    13k
 33  18  17  30   0   2|1654k 4357k|2565k  306k| 830k 2366k|4033    10k
 25  17  25  31   0   2|1130k 4053k|2540k  312k| 562k 1838k|3906  9722
 26  17  15  38   0   3|2481k 7118k|3870k  456k|1244k 3559k|5301    11k
 21  12  15  49   0   3|2406k 5041k|4389k  371k|1206k 2818k|4684  8747
 26  15  12  42   0   4|3582k 7320k|5002k  484k|1784k 3362k|5675  9934
 26  19  17  35   0   3|2412k 3452k|3165k  300k|1209k 1726k|4090  8727
 26  15  13  43   0   3|2531k 5294k|3727k  350k|1281k 2738k|4570  8857
 19  13   5  60   0   4|5471k 5148k|4661k  354k|2736k 2484k|4563  8084
 16   9  10  62   0   2|3656k 1818k|3464k  189k|1815k  948k|3121  5361
 22  15   5  54   0   4|5016k 3176k|5773k  412k|2524k 1549k|5337    10k
 20  12   9  57   0   3|2277k 1528k|3405k  288k|1120k  764k|3786  7112
 15   9   4  69   0   3|4410k 2786k|4233k  311k|2228k 1411k|4115  6685
 20  12  10  56   0   2|3765k 1953k|2490k  159k|1863k  964k|2550  6832
 26  14  22  36   0   2|1709k  569k|2969k  219k| 848k  279k|3229  8640
 16  11   7  63   0   3|4095k 2934k|4986k  316k|2047k 1471k|4413  7165
 18  11   3  66   0   3|4219k 1238k|3623k  247k|2119k  616k|3767  6728
 16  12   5  64   0   3|4122k 2278k|4400k  343k|2066k 1184k|4325  7220
 15  11   5  66   0   3|3715k 1467k|4760k  282k|1858k  824k|4130  5918
  7   9   0  80   0   3|4986k 2773k|5811k  328k|2652k 1255k|4244  5173
  9   6  10  74   0   2|4465k  846k|2100k  116k|2061k  420k|2106  2349
 13   8  12  63   0   4|3813k 2607k|5926k  365k|1917k 1309k|4588  5611
  6   6   0  84   0   3|3898k 1206k|4807k  236k|1976k  983k|3477  4210  missed 2 ticks
  6   4   6  83   0   1|4312k 1281k| 679k   58k|2118k  255k|1618  2035
 15   9  18  55   0   4|3489k 1354k|5087k  323k|1746k  713k|4396  5182
  9   5   2  82   0   2|4026k 1134k|1792k  101k|2020k  548k|2183  3555
 14  13   3  66   0   4|3269k 1974k|8776k  476k|1642k 1074k|5937  7077
 10   8   3  77   0   2|4211k 1192k|3227k  196k|2092k  492k|3098  4070
  7   6   7  78   0   3|3672k 2268k|4879k  234k|1833k 1134k|3490  3608
  8   7   6  74   0   4|3782k 2708k|5389k  309k|1902k 1357k|4026  4887
  1   6   0  91   0   2|4662k   33k|1720k  145k|2357k  117k|2587  2066
  3  11   0  85   0   1|4285k  941k|1506k   78k|2118k  431k|2026  1968
  5   8   0  83   0   4|4463k 3075k|5975k  364k|2219k 1729k|4167  4147
  3   4   5  86   0   2|4004k  834k|2943k  137k|2027k  161k|2518  2195
  3   3   0  93   0   2|3016k  974k|1979k   93k|1490k  676k|2034  1717
  7   5   2  85   0   2|4066k 2286k|2617k  195k|2047k  954k|2955  3344
  8   6   7  77   0   3|4247k 2599k|3422k  252k|2108k 1300k|3623  3129
  8   4  12  72   0   3|4056k 1235k|4237k  201k|2028k  618k|3190  2675
  5   7   0  84   0   3|3789k 1222k|5824k  314k|1955k  612k|3758  5173
  0   5   0  94   0   1|3544k  418k| 646k   29k|1744k  216k|1527   989
  1   3   0  94   0   2|3263k  263k|2193k  105k|1614k  165k|2173  1673
  2  13   0  83   0   2|3252k 1124k|2546k  200k|1612k  521k|2832  2386
  3  34   0  59   0   3|2959k  342k|7795k  325k|1472k  171k|4462  3451
  5  22   2  67   0   4|2898k 1534k|  10M  452k|1452k  767k|4380  4124
  9  12  12  66   0   2|3530k  479k|2890k  140k|1764k  240k|2453  2538
  6   6  12  74   0   2|3334k 2631k|2660k  122k|1672k 1546k|2480  2070  missed 2 ticks
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  9   3  21  65   0   2|3750k  765k|3169k  134k|1872k  152k|2273  1921
  5   6   1  83   0   4|3618k 1295k|6543k  330k|1891k  648k|4030  4131
  3   5   2  87   0   2|3600k 1054k|2851k  173k|1720k  527k|2815  2687
  4   7   1  83   0   5|3677k 1344k|6024k  314k|1844k  734k|3877  4376
  4   5   3  85   0   3|3953k  933k|3196k  152k|1989k  405k|2618  2321
  2   3   0  94   0   1|3106k  131k| 486k   24k|1544k  131k|1466  1374
  2   3   0  93   0   1|3089k  672k|1454k   65k|1540k  362k|1825  1909
  7   4   2  86   0   1|3393k  878k|1503k   84k|1694k  416k|1882  2033
  9   3  25  62   0   2|3496k 1833k|1979k   90k|1748k  848k|2112  1797
  6   4   3  84   0   3|3592k  861k|4340k  191k|1795k  432k|2926  3143
  4   6   0  87   0   3|3399k  847k|3758k  186k|1740k  440k|2699  4299
  1   2   0  97   0   1|2807k  365k| 685k   49k|1394k  168k|1175   840  missed 2 ticks
  2   3   4  90   0   2|3183k  801k|2022k   87k|1568k  399k|1998  1561
  2   3   2  91   0   2|3014k  726k|2214k   96k|1521k  368k|2072  1652
  4   5   2  86   0   3|3344k 1686k|4970k  217k|1659k  838k|3209  2936
  8   4  17  69   0   2|3026k  741k|1923k  107k|1510k  370k|1993  2227
  8   4  23  63   0   2|3496k 1026k|2948k  129k|1754k  513k|2347  2048
  6   7   2  81   0   4|3438k 1222k|5658k  272k|1746k  626k|3740  5708
  0   5   0  94   0   1|2902k   30k|1012k   43k|1435k    0 |1637  1161
  1   2   2  93   0   1|2968k  102k| 985k   59k|1471k  122k|1402  1101
  4   5   1  88   0   3|3651k 1814k|3838k  170k|1840k  841k|2769  2382
  2   2   1  94   0   1|2570k  344k| 500k   23k|1283k  214k|1360  1299
  5   3   2  89   0   1|2728k  964k|1119k   70k|1378k  450k|1760  2024
  8   3  24  64   0   1|2993k  967k| 737k   29k|1470k  468k|1432  1251
 12   2  37  48   0   1|2547k  710k| 651k   26k|1274k  360k|1435  1199
  9   3  26  60   0   2|3218k 1630k|3540k  153k|1612k  847k|2723  2174
  3   4   5  85   0   3|3618k  870k|3796k  168k|1807k  414k|2653  2497
  4   5   0  90   0   1|3134k  841k|1489k   81k|1591k  419k|1972  3498
  1   2   0  97   0   1|2910k  349k| 816k   55k|1438k  191k|1525  1096
  3   4   2  89   0   2|3240k  930k|2779k  122k|1610k  433k|2313  2036
  4   5   0  89   0   2|3079k 1340k|4054k  184k|1549k  670k|2981  3567
  2   6   1  90   0   1|2702k  256k|1080k   50k|1348k  178k|1658  1413
  3   4   6  85   0   2|3798k 1128k|2208k  105k|1890k  513k|2194  1984
 10   3  33  53   0   1|3619k 1239k|1147k   50k|1821k  620k|1708  1563
  7   5  12  73   0   3|3689k 1795k|3633k  185k|1833k  898k|2744  2404  missed 2 ticks
  4   4   4  85   0   3|3309k  282k|3728k  168k|1662k  166k|2661  2891
  2  11   0  84   0   2|2989k  195k|3949k  186k|1530k   92k|2528  3687
  0   2   0  96   0   1|2576k   67k|1148k   67k|1278k   40k|1668  1124
  1   2   0  95   0   2|2680k  896k|2093k   94k|1317k  548k|2088  1564
  1   2   0  95   0   1|2938k  809k|1769k   72k|1461k  279k|1825  1385
  2   3   3  90   0   2|3099k 1158k|2854k  125k|1562k  611k|2317  1841
  4   4   1  90   0   2|2806k  670k|2139k   94k|1398k  303k|2096  2173
  9   5  11  73   0   2|2930k 1646k|2741k  122k|1454k  823k|2504  2515
 11   3  29  56   0   1|3154k 1049k|1453k   85k|1578k  524k|1849  1599
  5   4   5  84   0   2|3135k  489k|3718k  161k|1570k  268k|2806  2712
  3   4   2  90   0   1|3010k  513k|1514k   82k|1530k  233k|1936  2989
  3   4   0  91   0   2|2891k  378k|3174k  148k|1430k  196k|2562  2776
  2  12   0  83   0   2|3146k  310k|3730k  184k|1569k  149k|2399  2101
  3   3   0  93   0   1|2491k  358k|1628k   73k|1245k  179k|1837  1755

  Thanks,
  Fengguang

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-12  1:59                             ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-12  1:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > (even worse this time):
> 
> Thanks for doing the tests.  Do you know if the time difference comes
> from IO or CPU time?
> 
> Because one reason I could think of is that the original code walks
> the readaround window in two directions, starting from the target each
> time but immediately stops when it encounters a hole where the new
> code just skips holes but doesn't abort readaround and thus might
> indeed read more slots.
> 
> I have an old patch flying around that changed the physical ra code to
> use a bitmap that is able to represent holes.  If the increased time
> is waiting for IO, I would be interested if that patch has the same
> negative impact.

You can send me the patch :)

But for this patch it is IO bound. The CPU iowait field actually is
going up as the test goes on:

wfg@hp ~% dstat 10
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  3   3  89   4   0   1|  18k   27B|   0     0 |   0     0 |1530  1006
  0   1  99   0   0   0|   0     0 |  31k 9609B|   0     0 |1071   444
  1   1  97   1   0   0|   0     0 |  57k   13k|   0     0 |1139   870
 30  31  24  13   0   3|   0   741k|1648k  294k|   0   370k|3666    10k
 27  30  26  14   0   3| 361k 3227k|1264k  262k| 180k 1614k|3471  9457
 25  25  29  18   0   2| 479k 4102k|2353k  285k| 240k 2051k|3707  9429
 39  44   5   8   0   4| 256k 7646k|2711k  564k| 128k 3823k|7055    13k
 33  18  17  30   0   2|1654k 4357k|2565k  306k| 830k 2366k|4033    10k
 25  17  25  31   0   2|1130k 4053k|2540k  312k| 562k 1838k|3906  9722
 26  17  15  38   0   3|2481k 7118k|3870k  456k|1244k 3559k|5301    11k
 21  12  15  49   0   3|2406k 5041k|4389k  371k|1206k 2818k|4684  8747
 26  15  12  42   0   4|3582k 7320k|5002k  484k|1784k 3362k|5675  9934
 26  19  17  35   0   3|2412k 3452k|3165k  300k|1209k 1726k|4090  8727
 26  15  13  43   0   3|2531k 5294k|3727k  350k|1281k 2738k|4570  8857
 19  13   5  60   0   4|5471k 5148k|4661k  354k|2736k 2484k|4563  8084
 16   9  10  62   0   2|3656k 1818k|3464k  189k|1815k  948k|3121  5361
 22  15   5  54   0   4|5016k 3176k|5773k  412k|2524k 1549k|5337    10k
 20  12   9  57   0   3|2277k 1528k|3405k  288k|1120k  764k|3786  7112
 15   9   4  69   0   3|4410k 2786k|4233k  311k|2228k 1411k|4115  6685
 20  12  10  56   0   2|3765k 1953k|2490k  159k|1863k  964k|2550  6832
 26  14  22  36   0   2|1709k  569k|2969k  219k| 848k  279k|3229  8640
 16  11   7  63   0   3|4095k 2934k|4986k  316k|2047k 1471k|4413  7165
 18  11   3  66   0   3|4219k 1238k|3623k  247k|2119k  616k|3767  6728
 16  12   5  64   0   3|4122k 2278k|4400k  343k|2066k 1184k|4325  7220
 15  11   5  66   0   3|3715k 1467k|4760k  282k|1858k  824k|4130  5918
  7   9   0  80   0   3|4986k 2773k|5811k  328k|2652k 1255k|4244  5173
  9   6  10  74   0   2|4465k  846k|2100k  116k|2061k  420k|2106  2349
 13   8  12  63   0   4|3813k 2607k|5926k  365k|1917k 1309k|4588  5611
  6   6   0  84   0   3|3898k 1206k|4807k  236k|1976k  983k|3477  4210  missed 2 ticks
  6   4   6  83   0   1|4312k 1281k| 679k   58k|2118k  255k|1618  2035
 15   9  18  55   0   4|3489k 1354k|5087k  323k|1746k  713k|4396  5182
  9   5   2  82   0   2|4026k 1134k|1792k  101k|2020k  548k|2183  3555
 14  13   3  66   0   4|3269k 1974k|8776k  476k|1642k 1074k|5937  7077
 10   8   3  77   0   2|4211k 1192k|3227k  196k|2092k  492k|3098  4070
  7   6   7  78   0   3|3672k 2268k|4879k  234k|1833k 1134k|3490  3608
  8   7   6  74   0   4|3782k 2708k|5389k  309k|1902k 1357k|4026  4887
  1   6   0  91   0   2|4662k   33k|1720k  145k|2357k  117k|2587  2066
  3  11   0  85   0   1|4285k  941k|1506k   78k|2118k  431k|2026  1968
  5   8   0  83   0   4|4463k 3075k|5975k  364k|2219k 1729k|4167  4147
  3   4   5  86   0   2|4004k  834k|2943k  137k|2027k  161k|2518  2195
  3   3   0  93   0   2|3016k  974k|1979k   93k|1490k  676k|2034  1717
  7   5   2  85   0   2|4066k 2286k|2617k  195k|2047k  954k|2955  3344
  8   6   7  77   0   3|4247k 2599k|3422k  252k|2108k 1300k|3623  3129
  8   4  12  72   0   3|4056k 1235k|4237k  201k|2028k  618k|3190  2675
  5   7   0  84   0   3|3789k 1222k|5824k  314k|1955k  612k|3758  5173
  0   5   0  94   0   1|3544k  418k| 646k   29k|1744k  216k|1527   989
  1   3   0  94   0   2|3263k  263k|2193k  105k|1614k  165k|2173  1673
  2  13   0  83   0   2|3252k 1124k|2546k  200k|1612k  521k|2832  2386
  3  34   0  59   0   3|2959k  342k|7795k  325k|1472k  171k|4462  3451
  5  22   2  67   0   4|2898k 1534k|  10M  452k|1452k  767k|4380  4124
  9  12  12  66   0   2|3530k  479k|2890k  140k|1764k  240k|2453  2538
  6   6  12  74   0   2|3334k 2631k|2660k  122k|1672k 1546k|2480  2070  missed 2 ticks
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  9   3  21  65   0   2|3750k  765k|3169k  134k|1872k  152k|2273  1921
  5   6   1  83   0   4|3618k 1295k|6543k  330k|1891k  648k|4030  4131
  3   5   2  87   0   2|3600k 1054k|2851k  173k|1720k  527k|2815  2687
  4   7   1  83   0   5|3677k 1344k|6024k  314k|1844k  734k|3877  4376
  4   5   3  85   0   3|3953k  933k|3196k  152k|1989k  405k|2618  2321
  2   3   0  94   0   1|3106k  131k| 486k   24k|1544k  131k|1466  1374
  2   3   0  93   0   1|3089k  672k|1454k   65k|1540k  362k|1825  1909
  7   4   2  86   0   1|3393k  878k|1503k   84k|1694k  416k|1882  2033
  9   3  25  62   0   2|3496k 1833k|1979k   90k|1748k  848k|2112  1797
  6   4   3  84   0   3|3592k  861k|4340k  191k|1795k  432k|2926  3143
  4   6   0  87   0   3|3399k  847k|3758k  186k|1740k  440k|2699  4299
  1   2   0  97   0   1|2807k  365k| 685k   49k|1394k  168k|1175   840  missed 2 ticks
  2   3   4  90   0   2|3183k  801k|2022k   87k|1568k  399k|1998  1561
  2   3   2  91   0   2|3014k  726k|2214k   96k|1521k  368k|2072  1652
  4   5   2  86   0   3|3344k 1686k|4970k  217k|1659k  838k|3209  2936
  8   4  17  69   0   2|3026k  741k|1923k  107k|1510k  370k|1993  2227
  8   4  23  63   0   2|3496k 1026k|2948k  129k|1754k  513k|2347  2048
  6   7   2  81   0   4|3438k 1222k|5658k  272k|1746k  626k|3740  5708
  0   5   0  94   0   1|2902k   30k|1012k   43k|1435k    0 |1637  1161
  1   2   2  93   0   1|2968k  102k| 985k   59k|1471k  122k|1402  1101
  4   5   1  88   0   3|3651k 1814k|3838k  170k|1840k  841k|2769  2382
  2   2   1  94   0   1|2570k  344k| 500k   23k|1283k  214k|1360  1299
  5   3   2  89   0   1|2728k  964k|1119k   70k|1378k  450k|1760  2024
  8   3  24  64   0   1|2993k  967k| 737k   29k|1470k  468k|1432  1251
 12   2  37  48   0   1|2547k  710k| 651k   26k|1274k  360k|1435  1199
  9   3  26  60   0   2|3218k 1630k|3540k  153k|1612k  847k|2723  2174
  3   4   5  85   0   3|3618k  870k|3796k  168k|1807k  414k|2653  2497
  4   5   0  90   0   1|3134k  841k|1489k   81k|1591k  419k|1972  3498
  1   2   0  97   0   1|2910k  349k| 816k   55k|1438k  191k|1525  1096
  3   4   2  89   0   2|3240k  930k|2779k  122k|1610k  433k|2313  2036
  4   5   0  89   0   2|3079k 1340k|4054k  184k|1549k  670k|2981  3567
  2   6   1  90   0   1|2702k  256k|1080k   50k|1348k  178k|1658  1413
  3   4   6  85   0   2|3798k 1128k|2208k  105k|1890k  513k|2194  1984
 10   3  33  53   0   1|3619k 1239k|1147k   50k|1821k  620k|1708  1563
  7   5  12  73   0   3|3689k 1795k|3633k  185k|1833k  898k|2744  2404  missed 2 ticks
  4   4   4  85   0   3|3309k  282k|3728k  168k|1662k  166k|2661  2891
  2  11   0  84   0   2|2989k  195k|3949k  186k|1530k   92k|2528  3687
  0   2   0  96   0   1|2576k   67k|1148k   67k|1278k   40k|1668  1124
  1   2   0  95   0   2|2680k  896k|2093k   94k|1317k  548k|2088  1564
  1   2   0  95   0   1|2938k  809k|1769k   72k|1461k  279k|1825  1385
  2   3   3  90   0   2|3099k 1158k|2854k  125k|1562k  611k|2317  1841
  4   4   1  90   0   2|2806k  670k|2139k   94k|1398k  303k|2096  2173
  9   5  11  73   0   2|2930k 1646k|2741k  122k|1454k  823k|2504  2515
 11   3  29  56   0   1|3154k 1049k|1453k   85k|1578k  524k|1849  1599
  5   4   5  84   0   2|3135k  489k|3718k  161k|1570k  268k|2806  2712
  3   4   2  90   0   1|3010k  513k|1514k   82k|1530k  233k|1936  2989
  3   4   0  91   0   2|2891k  378k|3174k  148k|1430k  196k|2562  2776
  2  12   0  83   0   2|3146k  310k|3730k  184k|1569k  149k|2399  2101
  3   3   0  93   0   1|2491k  358k|1628k   73k|1245k  179k|1837  1755

  Thanks,
  Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-12  1:59                             ` Wu Fengguang
@ 2009-06-15 18:22                               ` Johannes Weiner
  -1 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-15 18:22 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > (even worse this time):
> > 
> > Thanks for doing the tests.  Do you know if the time difference comes
> > from IO or CPU time?
> > 
> > Because one reason I could think of is that the original code walks
> > the readaround window in two directions, starting from the target each
> > time but immediately stops when it encounters a hole where the new
> > code just skips holes but doesn't abort readaround and thus might
> > indeed read more slots.
> > 
> > I have an old patch flying around that changed the physical ra code to
> > use a bitmap that is able to represent holes.  If the increased time
> > is waiting for IO, I would be interested if that patch has the same
> > negative impact.
> 
> You can send me the patch :)

Okay, attached is a rebase against latest -mmotm.

> But for this patch it is IO bound. The CPU iowait field actually is
> going up as the test goes on:

It's probably the larger ra window then which takes away the bandwidth
needed to load the new executables.  This sucks.  Would be nice to
have 'optional IO' for readahead that is dropped when normal-priority
IO requests are coming in...  Oh, we have READA for bios.  But it
doesn't seem to implement dropping requests on load (or I am blind).

	Hannes

---

diff --git a/include/linux/swap.h b/include/linux/swap.h
index c88b366..119ad43 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -284,7 +284,7 @@ extern swp_entry_t get_swap_page(void);
 extern swp_entry_t get_swap_page_of_type(int);
 extern void swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
+extern pgoff_t valid_swaphandles(swp_entry_t, unsigned long *, unsigned long);
 extern void swap_free(swp_entry_t);
 extern void swapcache_free(swp_entry_t, struct page *page);
 extern int free_swap_and_cache(swp_entry_t);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 42cd38e..c9f9c97 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -348,10 +348,10 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr)
 {
-	int nr_pages;
-	struct page *page;
+	unsigned long nr_slots = 1 << page_cluster;
+	DECLARE_BITMAP(slots, nr_slots);
 	unsigned long offset;
-	unsigned long end_offset;
+	pgoff_t base;
 
 	/*
 	 * Get starting offset for readaround, and number of pages to read.
@@ -360,11 +360,15 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	 * more likely that neighbouring swap pages came from the same node:
 	 * so use the same "addr" to choose the same node for each swap read.
 	 */
-	nr_pages = valid_swaphandles(entry, &offset);
-	for (end_offset = offset + nr_pages; offset < end_offset; offset++) {
-		/* Ok, do the async read-ahead now */
-		page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
-						gfp_mask, vma, addr);
+	base = valid_swaphandles(entry, slots, nr_slots);
+	for (offset = find_first_bit(slots, nr_slots);
+	     offset < nr_slots;
+	     offset = find_next_bit(slots, nr_slots, offset + 1)) {
+		struct page *page;
+		swp_entry_t tmp;
+
+		tmp = swp_entry(swp_type(entry), base + offset);
+		page = read_swap_cache_async(tmp, gfp_mask, vma, addr);
 		if (!page)
 			break;
 		page_cache_release(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d1ade1a..27771dd 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2163,25 +2163,28 @@ get_swap_info_struct(unsigned type)
 	return &swap_info[type];
 }
 
+static int swap_inuse(unsigned long count)
+{
+	int swapcount = swap_count(count);
+	return swapcount && swapcount != SWAP_MAP_BAD;
+}
+
 /*
  * swap_lock prevents swap_map being freed. Don't grab an extra
  * reference on the swaphandle, it doesn't matter if it becomes unused.
  */
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
+pgoff_t valid_swaphandles(swp_entry_t entry, unsigned long *slots,
+			unsigned long nr_slots)
 {
 	struct swap_info_struct *si;
-	int our_page_cluster = page_cluster;
-	pgoff_t target, toff;
-	pgoff_t base, end;
-	int nr_pages = 0;
-
-	if (!our_page_cluster)	/* no readahead */
-		return 0;
+	pgoff_t target, base, end;
 
+	bitmap_zero(slots, nr_slots);
 	si = &swap_info[swp_type(entry)];
 	target = swp_offset(entry);
-	base = (target >> our_page_cluster) << our_page_cluster;
-	end = base + (1 << our_page_cluster);
+	base = target & ~(nr_slots - 1);
+	end = base + nr_slots;
+
 	if (!base)		/* first page is swap header */
 		base++;
 
@@ -2189,28 +2192,10 @@ int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
 	if (end > si->max)	/* don't go beyond end of map */
 		end = si->max;
 
-	/* Count contiguous allocated slots above our target */
-	for (toff = target; ++toff < end; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
-			break;
-	}
-	/* Count contiguous allocated slots below our target */
-	for (toff = target; --toff >= base; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
-			break;
-	}
-	spin_unlock(&swap_lock);
+	while (end-- > base)
+		if (end == target || swap_inuse(si->swap_map[end]))
+			set_bit(end - base, slots);
 
-	/*
-	 * Indicate starting offset, and return number of pages to get:
-	 * if only 1, say 0, since there's then no readahead to be done.
-	 */
-	*offset = ++toff;
-	return nr_pages? ++nr_pages: 0;
+	spin_unlock(&swap_lock);
+	return base;
 }


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-15 18:22                               ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-15 18:22 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > (even worse this time):
> > 
> > Thanks for doing the tests.  Do you know if the time difference comes
> > from IO or CPU time?
> > 
> > Because one reason I could think of is that the original code walks
> > the readaround window in two directions, starting from the target each
> > time but immediately stops when it encounters a hole where the new
> > code just skips holes but doesn't abort readaround and thus might
> > indeed read more slots.
> > 
> > I have an old patch flying around that changed the physical ra code to
> > use a bitmap that is able to represent holes.  If the increased time
> > is waiting for IO, I would be interested if that patch has the same
> > negative impact.
> 
> You can send me the patch :)

Okay, attached is a rebase against latest -mmotm.

> But for this patch it is IO bound. The CPU iowait field actually is
> going up as the test goes on:

It's probably the larger ra window then which takes away the bandwidth
needed to load the new executables.  This sucks.  Would be nice to
have 'optional IO' for readahead that is dropped when normal-priority
IO requests are coming in...  Oh, we have READA for bios.  But it
doesn't seem to implement dropping requests on load (or I am blind).

	Hannes

---

diff --git a/include/linux/swap.h b/include/linux/swap.h
index c88b366..119ad43 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -284,7 +284,7 @@ extern swp_entry_t get_swap_page(void);
 extern swp_entry_t get_swap_page_of_type(int);
 extern void swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
+extern pgoff_t valid_swaphandles(swp_entry_t, unsigned long *, unsigned long);
 extern void swap_free(swp_entry_t);
 extern void swapcache_free(swp_entry_t, struct page *page);
 extern int free_swap_and_cache(swp_entry_t);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 42cd38e..c9f9c97 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -348,10 +348,10 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr)
 {
-	int nr_pages;
-	struct page *page;
+	unsigned long nr_slots = 1 << page_cluster;
+	DECLARE_BITMAP(slots, nr_slots);
 	unsigned long offset;
-	unsigned long end_offset;
+	pgoff_t base;
 
 	/*
 	 * Get starting offset for readaround, and number of pages to read.
@@ -360,11 +360,15 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	 * more likely that neighbouring swap pages came from the same node:
 	 * so use the same "addr" to choose the same node for each swap read.
 	 */
-	nr_pages = valid_swaphandles(entry, &offset);
-	for (end_offset = offset + nr_pages; offset < end_offset; offset++) {
-		/* Ok, do the async read-ahead now */
-		page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
-						gfp_mask, vma, addr);
+	base = valid_swaphandles(entry, slots, nr_slots);
+	for (offset = find_first_bit(slots, nr_slots);
+	     offset < nr_slots;
+	     offset = find_next_bit(slots, nr_slots, offset + 1)) {
+		struct page *page;
+		swp_entry_t tmp;
+
+		tmp = swp_entry(swp_type(entry), base + offset);
+		page = read_swap_cache_async(tmp, gfp_mask, vma, addr);
 		if (!page)
 			break;
 		page_cache_release(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d1ade1a..27771dd 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2163,25 +2163,28 @@ get_swap_info_struct(unsigned type)
 	return &swap_info[type];
 }
 
+static int swap_inuse(unsigned long count)
+{
+	int swapcount = swap_count(count);
+	return swapcount && swapcount != SWAP_MAP_BAD;
+}
+
 /*
  * swap_lock prevents swap_map being freed. Don't grab an extra
  * reference on the swaphandle, it doesn't matter if it becomes unused.
  */
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
+pgoff_t valid_swaphandles(swp_entry_t entry, unsigned long *slots,
+			unsigned long nr_slots)
 {
 	struct swap_info_struct *si;
-	int our_page_cluster = page_cluster;
-	pgoff_t target, toff;
-	pgoff_t base, end;
-	int nr_pages = 0;
-
-	if (!our_page_cluster)	/* no readahead */
-		return 0;
+	pgoff_t target, base, end;
 
+	bitmap_zero(slots, nr_slots);
 	si = &swap_info[swp_type(entry)];
 	target = swp_offset(entry);
-	base = (target >> our_page_cluster) << our_page_cluster;
-	end = base + (1 << our_page_cluster);
+	base = target & ~(nr_slots - 1);
+	end = base + nr_slots;
+
 	if (!base)		/* first page is swap header */
 		base++;
 
@@ -2189,28 +2192,10 @@ int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
 	if (end > si->max)	/* don't go beyond end of map */
 		end = si->max;
 
-	/* Count contiguous allocated slots above our target */
-	for (toff = target; ++toff < end; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
-			break;
-	}
-	/* Count contiguous allocated slots below our target */
-	for (toff = target; --toff >= base; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
-			break;
-	}
-	spin_unlock(&swap_lock);
+	while (end-- > base)
+		if (end == target || swap_inuse(si->swap_map[end]))
+			set_bit(end - base, slots);
 
-	/*
-	 * Indicate starting offset, and return number of pages to get:
-	 * if only 1, say 0, since there's then no readahead to be done.
-	 */
-	*offset = ++toff;
-	return nr_pages? ++nr_pages: 0;
+	spin_unlock(&swap_lock);
+	return base;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-11  5:31   ` KAMEZAWA Hiroyuki
@ 2009-06-17 22:41     ` Johannes Weiner
  -1 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-17 22:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 9 Jun 2009 21:01:28 +0200
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> > [resend with lists cc'd, sorry]
> > 
> > +static int swap_readahead_ptes(struct mm_struct *mm,
> > +			unsigned long addr, pmd_t *pmd,
> > +			swp_entry_t *entries,
> > +			unsigned long cluster)
> > +{
> > +	unsigned long window, min, max, limit;
> > +	spinlock_t *ptl;
> > +	pte_t *ptep;
> > +	int i, nr;
> > +
> > +	window = cluster << PAGE_SHIFT;
> > +	min = addr & ~(window - 1);
> > +	max = min + cluster;
> 
> Johannes, I wonder there is no reason to use "alignment".

I am wondering too.  I digged into the archives but the alignment
comes from a change older than what history.git documents, so I wasn't
able to find written down justification for this.

> I think we just need to read "nearby" pages. Then, this function's
> scan range should be
> 
> 	[addr - window/2, addr + window/2)
> or some.
> 
> And here, too
> > +	if (!entries)	/* XXX: shmem case */
> > +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> > +	pmin = swp_offset(entry) & ~(cluster - 1);
> > +	pmax = pmin + cluster;
> 
> pmin = swp_offset(entry) - cluster/2.
> pmax = swp_offset(entry) + cluster/2.
> 
> I'm sorry if I miss a reason for using "alignment".

Perhas someone else knows a good reason for it, but I think it could
even be harmful.

Chances are that several processes fault around the same slots
simultaneously.  By letting them all start at the same aligned offset
we have a maximum race between them and they all allocate pages for
the same slots concurrently.

By placing the window unaligned we decrease this overlapping, so it
sounds like a good idea.

It would increase the amount of readahead done even more, though, and
Fengguang already measured degradation in IO latency with my patch, so
this probably needs more changes to work well.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-17 22:41     ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-17 22:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Hugh Dickins, Andi Kleen,
	Wu Fengguang, Minchan Kim, linux-mm, linux-kernel

On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 9 Jun 2009 21:01:28 +0200
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> > [resend with lists cc'd, sorry]
> > 
> > +static int swap_readahead_ptes(struct mm_struct *mm,
> > +			unsigned long addr, pmd_t *pmd,
> > +			swp_entry_t *entries,
> > +			unsigned long cluster)
> > +{
> > +	unsigned long window, min, max, limit;
> > +	spinlock_t *ptl;
> > +	pte_t *ptep;
> > +	int i, nr;
> > +
> > +	window = cluster << PAGE_SHIFT;
> > +	min = addr & ~(window - 1);
> > +	max = min + cluster;
> 
> Johannes, I wonder there is no reason to use "alignment".

I am wondering too.  I digged into the archives but the alignment
comes from a change older than what history.git documents, so I wasn't
able to find written down justification for this.

> I think we just need to read "nearby" pages. Then, this function's
> scan range should be
> 
> 	[addr - window/2, addr + window/2)
> or some.
> 
> And here, too
> > +	if (!entries)	/* XXX: shmem case */
> > +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> > +	pmin = swp_offset(entry) & ~(cluster - 1);
> > +	pmax = pmin + cluster;
> 
> pmin = swp_offset(entry) - cluster/2.
> pmax = swp_offset(entry) + cluster/2.
> 
> I'm sorry if I miss a reason for using "alignment".

Perhas someone else knows a good reason for it, but I think it could
even be harmful.

Chances are that several processes fault around the same slots
simultaneously.  By letting them all start at the same aligned offset
we have a maximum race between them and they all allocate pages for
the same slots concurrently.

By placing the window unaligned we decrease this overlapping, so it
sounds like a good idea.

It would increase the amount of readahead done even more, though, and
Fengguang already measured degradation in IO latency with my patch, so
this probably needs more changes to work well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-15 18:22                               ` Johannes Weiner
@ 2009-06-18  9:19                                 ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-18  9:19 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > (even worse this time):
> > > 
> > > Thanks for doing the tests.  Do you know if the time difference comes
> > > from IO or CPU time?
> > > 
> > > Because one reason I could think of is that the original code walks
> > > the readaround window in two directions, starting from the target each
> > > time but immediately stops when it encounters a hole where the new
> > > code just skips holes but doesn't abort readaround and thus might
> > > indeed read more slots.
> > > 
> > > I have an old patch flying around that changed the physical ra code to
> > > use a bitmap that is able to represent holes.  If the increased time
> > > is waiting for IO, I would be interested if that patch has the same
> > > negative impact.
> > 
> > You can send me the patch :)
> 
> Okay, attached is a rebase against latest -mmotm.
> 
> > But for this patch it is IO bound. The CPU iowait field actually is
> > going up as the test goes on:
> 
> It's probably the larger ra window then which takes away the bandwidth
> needed to load the new executables.  This sucks.  Would be nice to
> have 'optional IO' for readahead that is dropped when normal-priority
> IO requests are coming in...  Oh, we have READA for bios.  But it
> doesn't seem to implement dropping requests on load (or I am blind).

Hi Hannes,

Sorry for the long delay! A bad news is that I get many oom with this patch:

[  781.450862] Xorg invoked oom-killer: gfp_mask=0xd2, order=0, oom_adj=0
[  781.457411] Pid: 3272, comm: Xorg Not tainted 2.6.30-rc8-mm1 #312
[  781.463511] Call Trace:
[  781.465976]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  781.471462]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  781.477210]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  781.482449]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  781.488188]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  781.493666]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  781.500015]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[  781.505846]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  781.511857]  [<ffffffff810e7fe8>] __vmalloc_area_node+0xf8/0x190
[  781.517869]  [<ffffffffa014c9b5>] ? i915_gem_execbuffer+0xb45/0x12f0 [i915]
[  781.524835]  [<ffffffff810e8121>] __vmalloc_node+0xa1/0xb0
[  781.530346]  [<ffffffffa014c9b5>] ? i915_gem_execbuffer+0xb45/0x12f0 [i915]
[  781.537312]  [<ffffffffa014bf2b>] ? i915_gem_execbuffer+0xbb/0x12f0 [i915]
[  781.544192]  [<ffffffff810e8281>] vmalloc+0x21/0x30
[  781.549100]  [<ffffffffa014c9b5>] i915_gem_execbuffer+0xb45/0x12f0 [i915]
[  781.555920]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[  781.561789]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  781.567569]  [<ffffffffa014be70>] ? i915_gem_execbuffer+0x0/0x12f0 [i915]
[  781.574383]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[  781.580225]  [<ffffffff8110babd>] vfs_ioctl+0x7d/0xa0
[  781.585287]  [<ffffffff8110bb6a>] do_vfs_ioctl+0x8a/0x580
[  781.590706]  [<ffffffff81078f3a>] ? lockdep_sys_exit+0x2a/0x90
[  781.596552]  [<ffffffff81544b34>] ? lockdep_sys_exit_thunk+0x35/0x67
[  781.602929]  [<ffffffff8110c0aa>] sys_ioctl+0x4a/0x80
[  781.607995]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  781.614005] Mem-Info:
[  781.616293] Node 0 DMA per-cpu:
[  781.619471] CPU    0: hi:    0, btch:   1 usd:   0
[  781.624278] CPU    1: hi:    0, btch:   1 usd:   0
[  781.629080] Node 0 DMA32 per-cpu:
[  781.632443] CPU    0: hi:  186, btch:  31 usd:  83
[  781.637243] CPU    1: hi:  186, btch:  31 usd: 108
[  781.642045] Active_anon:41057 active_file:2334 inactive_anon:47003
[  781.642048]  inactive_file:2148 unevictable:4 dirty:0 writeback:0 unstable:0
[  781.642051]  free:1180 slab:14177 mapped:4473 pagetables:7629 bounce:0
[  781.661802] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5408kB inactive_anon:5676kB active_file:16kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:42276 all_unreclaimable? no
[  781.680773] lowmem_reserve[]: 0 483 483 483
[  781.685089] Node 0 DMA32 free:2704kB min:2768kB low:3460kB high:4152kB active_anon:158820kB inactive_anon:182224kB active_file:9320kB inactive_file:8592kB unevictable:16kB present:495008kB pages_scanned:673623 all_unreclaimable? yes
[  781.705711] lowmem_reserve[]: 0 0 0 0
[  781.709501] Node 0 DMA: 104*4kB 0*8kB 6*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  781.720553] Node 0 DMA32: 318*4kB 1*8kB 1*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2704kB
[  781.731764] 61569 total pagecache pages
[  781.735618] 6489 pages in swap cache
[  781.739212] Swap cache stats: add 285146, delete 278657, find 31455/133061
[  781.746092] Free swap  = 709316kB
[  781.749417] Total swap = 1048568kB
[  781.759726] 131072 pages RAM
[  781.762645] 9628 pages reserved
[  781.765793] 95620 pages shared
[  781.768862] 58466 pages non-shared
[  781.772278] Out of memory: kill process 3487 (run-many-x-apps) score 1471069 or a child
[  781.780291] Killed process 3488 (xeyes)
[  781.830240] gtali invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  781.837208] Pid: 4113, comm: gtali Not tainted 2.6.30-rc8-mm1 #312
[  781.843554] Call Trace:
[  781.846233]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  781.851870]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  781.857615]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  781.862840]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  781.868578]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  781.874054]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  781.880401]  [<ffffffff810f3fb6>] alloc_page_vma+0x86/0x1c0
[  781.885969]  [<ffffffff810e9d08>] read_swap_cache_async+0xd8/0x120
[  781.892147]  [<ffffffff810e9f05>] swapin_readahead+0xb5/0x110
[  781.897886]  [<ffffffff810dac73>] do_swap_page+0x403/0x510
[  781.903366]  [<ffffffff810e9933>] ? lookup_swap_cache+0x13/0x30
[  781.909279]  [<ffffffff810da8ea>] ? do_swap_page+0x7a/0x510
[  781.914850]  [<ffffffff810dc72e>] handle_mm_fault+0x44e/0x500
[  781.920587]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  781.926149]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  781.931287] Mem-Info:
[  781.933559] Node 0 DMA per-cpu:
[  781.936714] CPU    0: hi:    0, btch:   1 usd:   0
[  781.941500] CPU    1: hi:    0, btch:   1 usd:   0
[  781.946288] Node 0 DMA32 per-cpu:
[  781.949615] CPU    0: hi:  186, btch:  31 usd:  84
[  781.954402] CPU    1: hi:  186, btch:  31 usd: 109
[  781.959192] Active_anon:41029 active_file:2334 inactive_anon:46908
[  781.959193]  inactive_file:2211 unevictable:4 dirty:0 writeback:0 unstable:0
[  781.959194]  free:1180 slab:14177 mapped:4492 pagetables:7608 bounce:0
[  781.978897] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5296kB inactive_anon:5408kB active_file:16kB inactive_file:176kB unevictable:0kB present:15164kB pages_scanned:6816 all_unreclaimable? no
[  781.997900] lowmem_reserve[]: 0 483 483 483
[  782.002173] Node 0 DMA32 free:2704kB min:2768kB low:3460kB high:4152kB active_anon:158820kB inactive_anon:182224kB active_file:9320kB inactive_file:8668kB unevictable:16kB present:495008kB pages_scanned:674199 all_unreclaimable? yes
[  782.022740] lowmem_reserve[]: 0 0 0 0
[  782.026488] Node 0 DMA: 82*4kB 9*8kB 7*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  782.037309] Node 0 DMA32: 318*4kB 1*8kB 1*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2704kB
[  782.048405] 61637 total pagecache pages
[  782.052236] 6494 pages in swap cache
[  782.055809] Swap cache stats: add 285154, delete 278660, find 31456/133069
[  782.062672] Free swap  = 709592kB
[  782.065983] Total swap = 1048568kB
[  782.072735] 131072 pages RAM
[  782.075632] 9628 pages reserved
[  782.078774] 95669 pages shared
[  782.081822] 58413 pages non-shared
[  782.085223] Out of memory: kill process 3487 (run-many-x-apps) score 1466556 or a child
[  782.093215] Killed process 3566 (gthumb)
[  790.063897] gnome-panel invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  790.071664] Pid: 3405, comm: gnome-panel Not tainted 2.6.30-rc8-mm1 #312
[  790.078421] Call Trace:
[  790.080902]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  790.086410]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  790.092159]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  790.097387]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  790.103135]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  790.108632]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  790.115001]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  790.121002]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  790.126745]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  790.133352]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  790.140057]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  790.145103]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  790.150678]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  790.155902]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  790.161989]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  790.167738]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  790.173304]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  790.178441] Mem-Info:
[  790.180714] Node 0 DMA per-cpu:
[  790.183870] CPU    0: hi:    0, btch:   1 usd:   0
[  790.188659] CPU    1: hi:    0, btch:   1 usd:   0
[  790.193446] Node 0 DMA32 per-cpu:
[  790.196783] CPU    0: hi:  186, btch:  31 usd:  43
[  790.201569] CPU    1: hi:  186, btch:  31 usd:  31
[  790.206359] Active_anon:41179 active_file:900 inactive_anon:46967
[  790.206360]  inactive_file:4104 unevictable:4 dirty:0 writeback:0 unstable:0
[  790.206361]  free:1165 slab:13961 mapped:3241 pagetables:7475 bounce:0
[  790.225984] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5496kB inactive_anon:5800kB active_file:4kB inactive_file:220kB unevictable:0kB present:15164kB pages_scanned:26112 all_unreclaimable? yes
[  790.245079] lowmem_reserve[]: 0 483 483 483
[  790.249352] Node 0 DMA32 free:2648kB min:2768kB low:3460kB high:4152kB active_anon:159220kB inactive_anon:182068kB active_file:3596kB inactive_file:16196kB unevictable:16kB present:495008kB pages_scanned:875456 all_unreclaimable? yes
[  790.270005] lowmem_reserve[]: 0 0 0 0
[  790.273762] Node 0 DMA: 53*4kB 9*8kB 12*16kB 2*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  790.284681] Node 0 DMA32: 190*4kB 46*8kB 7*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2648kB
[  790.295866] 62097 total pagecache pages
[  790.299698] 6548 pages in swap cache
[  790.303271] Swap cache stats: add 286032, delete 279484, find 31565/133879
[  790.310137] Free swap  = 717460kB
[  790.313445] Total swap = 1048568kB
[  790.320544] 131072 pages RAM
[  790.323445] 9628 pages reserved
[  790.326591] 85371 pages shared
[  790.329641] 59742 pages non-shared
[  790.333046] Out of memory: kill process 3487 (run-many-x-apps) score 1258333 or a child
[  790.341039] Killed process 3599 (gedit)
[  790.382081] gedit used greatest stack depth: 2064 bytes left
[  792.149572] Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[  792.156786] Pid: 3272, comm: Xorg Not tainted 2.6.30-rc8-mm1 #312
[  792.162980] Call Trace:
[  792.165429]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  792.170937]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  792.176691]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  792.181909]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  792.187653]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  792.193136]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  792.199490]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  792.205491]  [<ffffffff810c7409>] __get_free_pages+0x9/0x50
[  792.211060]  [<ffffffff8110e402>] __pollwait+0xc2/0x100
[  792.216283]  [<ffffffff81495903>] unix_poll+0x23/0xc0
[  792.221330]  [<ffffffff81419ac8>] sock_poll+0x18/0x20
[  792.226380]  [<ffffffff8110d9a9>] do_select+0x3e9/0x730
[  792.231597]  [<ffffffff8110d5c0>] ? do_select+0x0/0x730
[  792.236816]  [<ffffffff8110e340>] ? __pollwait+0x0/0x100
[  792.242126]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.247180]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.252227]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.257275]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.262331]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.267377]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.272422]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.277468]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.282519]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.287574]  [<ffffffff8110deef>] core_sys_select+0x1ff/0x330
[  792.293317]  [<ffffffff8110dd38>] ? core_sys_select+0x48/0x330
[  792.299162]  [<ffffffffa014954c>] ? i915_gem_throttle_ioctl+0x4c/0x60 [i915]
[  792.306204]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[  792.312034]  [<ffffffff810706cc>] ? getnstimeofday+0x5c/0xf0
[  792.317687]  [<ffffffff8106acb9>] ? ktime_get_ts+0x59/0x60
[  792.323169]  [<ffffffff8110e27a>] sys_select+0x4a/0x110
[  792.328387]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  792.334389] Mem-Info:
[  792.336663] Node 0 DMA per-cpu:
[  792.339824] CPU    0: hi:    0, btch:   1 usd:   0
[  792.344612] CPU    1: hi:    0, btch:   1 usd:   0
[  792.349397] Node 0 DMA32 per-cpu:
[  792.352734] CPU    0: hi:  186, btch:  31 usd:  57
[  792.357518] CPU    1: hi:  186, btch:  31 usd:  50
[  792.362310] Active_anon:40862 active_file:1622 inactive_anon:47020
[  792.362311]  inactive_file:3746 unevictable:4 dirty:0 writeback:0 unstable:0
[  792.362313]  free:1187 slab:13902 mapped:4052 pagetables:7387 bounce:0
[  792.382030] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5428kB inactive_anon:5680kB active_file:0kB inactive_file:224kB unevictable:0kB present:15164kB pages_scanned:4992 all_unreclaimable? no
[  792.400957] lowmem_reserve[]: 0 483 483 483
[  792.405232] Node 0 DMA32 free:2736kB min:2768kB low:3460kB high:4152kB active_anon:158020kB inactive_anon:182284kB active_file:6488kB inactive_file:14760kB unevictable:16kB present:495008kB pages_scanned:876741 all_unreclaimable? yes
[  792.425889] lowmem_reserve[]: 0 0 0 0
[  792.429637] Node 0 DMA: 31*4kB 14*8kB 15*16kB 2*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  792.440651] Node 0 DMA32: 86*4kB 95*8kB 14*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2736kB
[  792.451821] 62288 total pagecache pages
[  792.455655] 6442 pages in swap cache
[  792.459230] Swap cache stats: add 286223, delete 279781, find 31574/134040
[  792.466100] Free swap  = 723520kB
[  792.469405] Total swap = 1048568kB
[  792.476461] 131072 pages RAM
[  792.479359] 9628 pages reserved
[  792.482502] 86274 pages shared
[  792.485547] 59031 pages non-shared
[  792.488956] Out of memory: kill process 3487 (run-many-x-apps) score 1235901 or a child
[  792.496952] Killed process 3626 (xpdf.bin)
[  912.097890] gnome-control-c invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  912.105967] Pid: 5395, comm: gnome-control-c Not tainted 2.6.30-rc8-mm1 #312
[  912.113042] Call Trace:
[  912.115499]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  912.120994]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  912.126737]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  912.131961]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  912.137709]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  912.143193]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  912.149547]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  912.155551]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  912.161295]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  912.167904]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  912.174602]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  912.179650]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  912.185221]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  912.190445]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  912.196539]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  912.202278]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  912.207840]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  912.212976] Mem-Info:
[  912.215247] Node 0 DMA per-cpu:
[  912.218402] CPU    0: hi:    0, btch:   1 usd:   0
[  912.223190] CPU    1: hi:    0, btch:   1 usd:   0
[  912.227979] Node 0 DMA32 per-cpu:
[  912.231315] CPU    0: hi:  186, btch:  31 usd: 118
[  912.236100] CPU    1: hi:  186, btch:  31 usd: 158
[  912.240891] Active_anon:42350 active_file:809 inactive_anon:47098
[  912.240892]  inactive_file:2682 unevictable:4 dirty:0 writeback:3 unstable:0
[  912.240893]  free:1164 slab:13886 mapped:3078 pagetables:7561 bounce:0
[  912.260546] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5456kB inactive_anon:5676kB active_file:4kB inactive_file:72kB unevictable:0kB present:15164kB pages_scanned:1920 all_unreclaimable? no
[  912.279403] lowmem_reserve[]: 0 483 483 483
[  912.283671] Node 0 DMA32 free:2600kB min:2768kB low:3460kB high:4152kB active_anon:163944kB inactive_anon:182600kB active_file:3232kB inactive_file:10644kB unevictable:16kB present:495008kB pages_scanned:571360 all_unreclaimable? yes
[  912.304335] lowmem_reserve[]: 0 0 0 0
[  912.308082] Node 0 DMA: 22*4kB 16*8kB 12*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[  912.319093] Node 0 DMA32: 128*4kB 131*8kB 1*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2600kB
[  912.330367] 62393 total pagecache pages
[  912.334201] 7186 pages in swap cache
[  912.337778] Swap cache stats: add 320003, delete 312817, find 34852/153688
[  912.344648] Free swap  = 714408kB
[  912.347950] Total swap = 1048568kB
[  912.355114] 131072 pages RAM
[  912.358011] 9628 pages reserved
[  912.361153] 84608 pages shared
[  912.364199] 58138 pages non-shared
[  912.367606] Out of memory: kill process 3487 (run-many-x-apps) score 1281073 or a child
[  912.375604] Killed process 3669 (xterm)
[  912.427936] tty_ldisc_deref: no references.
[  912.480847] nautilus invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  912.487981] Pid: 3408, comm: nautilus Not tainted 2.6.30-rc8-mm1 #312
[  912.494418] Call Trace:
[  912.496876]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  912.502361]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  912.508100]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  912.513327]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  912.519067]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  912.524552]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  912.530902]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  912.536907]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  912.542645]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  912.549253]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  912.555946]  [<ffffffff810a9c9b>] ? delayacct_end+0x6b/0xa0
[  912.561517]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  912.566563]  [<ffffffff810cacb3>] ondemand_readahead+0x163/0x2d0
[  912.572563]  [<ffffffff810caf25>] page_cache_sync_readahead+0x25/0x30
[  912.579000]  [<ffffffff810c141c>] filemap_fault+0x37c/0x400
[  912.584576]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  912.589799]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  912.595888]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  912.601632]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  912.607206]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  912.612345] Mem-Info:
[  912.614624] Node 0 DMA per-cpu:
[  912.617787] CPU    0: hi:    0, btch:   1 usd:   0
[  912.622570] CPU    1: hi:    0, btch:   1 usd:   0
[  912.627353] Node 0 DMA32 per-cpu:
[  912.630682] CPU    0: hi:  186, btch:  31 usd: 121
[  912.635470] CPU    1: hi:  186, btch:  31 usd:  76
[  912.640259] Active_anon:42310 active_file:830 inactive_anon:47085
[  912.640260]  inactive_file:2747 unevictable:4 dirty:0 writeback:0 unstable:0
[  912.640261]  free:1182 slab:13881 mapped:3111 pagetables:7523 bounce:0
[  912.659881] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5468kB inactive_anon:5784kB active_file:4kB inactive_file:56kB unevictable:0kB present:15164kB pages_scanned:5152 all_unreclaimable? no
[  912.678724] lowmem_reserve[]: 0 483 483 483
[  912.682990] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:163772kB inactive_anon:182556kB active_file:3316kB inactive_file:10932kB unevictable:16kB present:495008kB pages_scanned:51712 all_unreclaimable? no
[  912.703478] lowmem_reserve[]: 0 0 0 0
[  912.707226] Node 0 DMA: 21*4kB 16*8kB 12*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2004kB
[  912.718239] Node 0 DMA32: 159*4kB 132*8kB 1*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2732kB
[  912.729502] 62461 total pagecache pages
[  912.733337] 7171 pages in swap cache
[  912.736915] Swap cache stats: add 320011, delete 312840, find 34852/153696
[  912.743782] Free swap  = 715668kB
[  912.747098] Total swap = 1048568kB
[  912.754168] 131072 pages RAM
[  912.757059] 9628 pages reserved
[  912.760191] 84519 pages shared
[  912.763248] 58139 pages non-shared
[  912.766653] Out of memory: kill process 3487 (run-many-x-apps) score 1273781 or a child
[  912.774647] Killed process 3762 (gnome-terminal)
[  913.650490] tty_ldisc_deref: no references.
[  914.671325] kerneloops-appl invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  914.679083] Pid: 3425, comm: kerneloops-appl Not tainted 2.6.30-rc8-mm1 #312
[  914.686121] Call Trace:
[  914.688575]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  914.694057]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  914.699800]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  914.705034]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  914.710791]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  914.716279]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  914.722640]  [<ffffffff810f3fb6>] alloc_page_vma+0x86/0x1c0
[  914.728208]  [<ffffffff810e9d08>] read_swap_cache_async+0xd8/0x120
[  914.734391]  [<ffffffff810e9f05>] swapin_readahead+0xb5/0x110
[  914.740139]  [<ffffffff810dac73>] do_swap_page+0x403/0x510
[  914.745632]  [<ffffffff810c0710>] ? find_get_page+0x0/0x110
[  914.751200]  [<ffffffff810e9933>] ? lookup_swap_cache+0x13/0x30
[  914.757115]  [<ffffffff810da8ea>] ? do_swap_page+0x7a/0x510
[  914.762688]  [<ffffffff810dc72e>] handle_mm_fault+0x44e/0x500
[  914.768437]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  914.774005]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  914.779136] Mem-Info:
[  914.781410] Node 0 DMA per-cpu:
[  914.784572] CPU    0: hi:    0, btch:   1 usd:   0
[  914.789367] CPU    1: hi:    0, btch:   1 usd:   0
[  914.794156] Node 0 DMA32 per-cpu:
[  914.797493] CPU    0: hi:  186, btch:  31 usd: 150
[  914.802278] CPU    1: hi:  186, btch:  31 usd: 147
[  914.807064] Active_anon:42324 active_file:1285 inactive_anon:47097
[  914.807065]  inactive_file:2225 unevictable:4 dirty:0 writeback:0 unstable:0
[  914.807067]  free:1185 slab:13908 mapped:3648 pagetables:7413 bounce:0
[  914.826781] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5360kB inactive_anon:5784kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:17408 all_unreclaimable? yes
[  914.845718] lowmem_reserve[]: 0 483 483 483
[  914.849988] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:163936kB inactive_anon:182604kB active_file:5140kB inactive_file:8908kB unevictable:16kB present:495008kB pages_scanned:581760 all_unreclaimable? yes
[  914.870559] lowmem_reserve[]: 0 0 0 0
[  914.874306] Node 0 DMA: 37*4kB 10*8kB 12*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[  914.885318] Node 0 DMA32: 119*4kB 139*8kB 7*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2724kB
[  914.896588] 62441 total pagecache pages
[  914.900417] 7199 pages in swap cache
[  914.903999] Swap cache stats: add 320272, delete 313073, find 34864/153895
[  914.910867] Free swap  = 721224kB
[  914.914193] Total swap = 1048568kB
[  914.921489] 131072 pages RAM
[  914.924370] 9628 pages reserved
[  914.927519] 84507 pages shared
[  914.930581] 57535 pages non-shared
[  914.933989] Out of memory: kill process 3487 (run-many-x-apps) score 1213315 or a child
[  914.941986] Killed process 3803 (urxvt)
[  914.947298] tty_ldisc_deref: no references.
[  919.983335] gnome-keyboard- invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  919.991145] Pid: 5458, comm: gnome-keyboard- Not tainted 2.6.30-rc8-mm1 #312
[  919.998198] Call Trace:
[  920.000663]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  920.006157]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  920.011906]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  920.017135]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  920.022876]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  920.028357]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  920.034706]  [<ffffffff810f3fb6>] alloc_page_vma+0x86/0x1c0
[  920.040280]  [<ffffffff810e9d08>] read_swap_cache_async+0xd8/0x120
[  920.046460]  [<ffffffff810e9f05>] swapin_readahead+0xb5/0x110
[  920.052196]  [<ffffffff810dac73>] do_swap_page+0x403/0x510
[  920.057676]  [<ffffffff810e9933>] ? lookup_swap_cache+0x13/0x30
[  920.063592]  [<ffffffff810da8ea>] ? do_swap_page+0x7a/0x510
[  920.069165]  [<ffffffff810dc72e>] handle_mm_fault+0x44e/0x500
[  920.074901]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  920.080470]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  920.085604] Mem-Info:
[  920.087875] Node 0 DMA per-cpu:
[  920.091031] CPU    0: hi:    0, btch:   1 usd:   0
[  920.095818] CPU    1: hi:    0, btch:   1 usd:   0
[  920.100617] Node 0 DMA32 per-cpu:
[  920.103947] CPU    0: hi:  186, btch:  31 usd:  89
[  920.108734] CPU    1: hi:  186, btch:  31 usd: 119
[  920.113524] Active_anon:42944 active_file:542 inactive_anon:46956
[  920.113525]  inactive_file:2652 unevictable:4 dirty:0 writeback:0 unstable:0
[  920.113526]  free:1169 slab:13893 mapped:3036 pagetables:7342 bounce:0
[  920.133149] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5568kB inactive_anon:5772kB active_file:20kB inactive_file:164kB unevictable:0kB present:15164kB pages_scanned:22824 all_unreclaimable? yes
[  920.152324] lowmem_reserve[]: 0 483 483 483
[  920.156597] Node 0 DMA32 free:2668kB min:2768kB low:3460kB high:4152kB active_anon:166208kB inactive_anon:182052kB active_file:2148kB inactive_file:10444kB unevictable:16kB present:495008kB pages_scanned:650400 all_unreclaimable? yes
[  920.177245] lowmem_reserve[]: 0 0 0 0
[  920.180991] Node 0 DMA: 44*4kB 9*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[  920.191903] Node 0 DMA32: 165*4kB 117*8kB 3*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2668kB
[  920.203169] 62409 total pagecache pages
[  920.207000] 7469 pages in swap cache
[  920.210572] Swap cache stats: add 321003, delete 313534, find 34989/154507
[  920.217436] Free swap  = 725812kB
[  920.220752] Total swap = 1048568kB
[  920.227856] 131072 pages RAM
[  920.230752] 9628 pages reserved
[  920.233901] 78560 pages shared
[  920.236958] 58011 pages non-shared
[  920.240355] Out of memory: kill process 3487 (run-many-x-apps) score 1195965 or a child
[  920.248346] Killed process 3889 (gnome-system-mo)
[  920.993872] nautilus invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  921.001843] Pid: 3408, comm: nautilus Not tainted 2.6.30-rc8-mm1 #312
[  921.008294] Call Trace:
[  921.010757]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  921.016245]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  921.021995]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  921.027215]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  921.032954]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  921.038441]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  921.044805]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  921.050808]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  921.056549]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  921.063163]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  921.069868]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  921.074918]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  921.080487]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  921.085717]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  921.091805]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  921.097552]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  921.103145]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  921.108280] Mem-Info:
[  921.110556] Node 0 DMA per-cpu:
[  921.113720] CPU    0: hi:    0, btch:   1 usd:   0
[  921.118501] CPU    1: hi:    0, btch:   1 usd:   0
[  921.123286] Node 0 DMA32 per-cpu:
[  921.126614] CPU    0: hi:  186, btch:  31 usd:  25
[  921.131400] CPU    1: hi:  186, btch:  31 usd:  58
[  921.136187] Active_anon:42277 active_file:992 inactive_anon:46953
[  921.136188]  inactive_file:3279 unevictable:4 dirty:0 writeback:0 unstable:0
[  921.136189]  free:1183 slab:13728 mapped:3449 pagetables:7235 bounce:0
[  921.155810] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5540kB inactive_anon:5772kB active_file:20kB inactive_file:224kB unevictable:0kB present:15164kB pages_scanned:18464 all_unreclaimable? yes
[  921.174995] lowmem_reserve[]: 0 483 483 483
[  921.179259] Node 0 DMA32 free:2716kB min:2768kB low:3460kB high:4152kB active_anon:163568kB inactive_anon:182040kB active_file:3948kB inactive_file:12892kB unevictable:16kB present:495008kB pages_scanned:719674 all_unreclaimable? yes
[  921.199914] lowmem_reserve[]: 0 0 0 0
[  921.203661] Node 0 DMA: 50*4kB 7*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  921.214577] Node 0 DMA32: 257*4kB 45*8kB 19*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2716kB
[  921.225837] 63208 total pagecache pages
[  921.229675] 7214 pages in swap cache
[  921.233249] Swap cache stats: add 321070, delete 313856, find 34991/154562
[  921.240112] Free swap  = 730844kB
[  921.243427] Total swap = 1048568kB
[  921.250566] 131072 pages RAM
[  921.253460] 9628 pages reserved
[  921.256599] 79050 pages shared
[  921.259646] 57895 pages non-shared
[  921.263048] Out of memory: kill process 3487 (run-many-x-apps) score 1168892 or a child
[  921.271042] Killed process 3917 (gnome-help)
[  934.057490] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  934.065285] Pid: 3353, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #312
[  934.072425] Call Trace:
[  934.074882]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  934.080382]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  934.086126]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  934.091349]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  934.097091]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  934.102568]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  934.108914]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  934.114922]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  934.120667]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  934.127269]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  934.133963]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  934.139018]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  934.144593]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  934.149812]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  934.155898]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  934.161640]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  934.167208]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  934.172348] Mem-Info:
[  934.174614] Node 0 DMA per-cpu:
[  934.177775] CPU    0: hi:    0, btch:   1 usd:   0
[  934.182560] CPU    1: hi:    0, btch:   1 usd:   0
[  934.187342] Node 0 DMA32 per-cpu:
[  934.190671] CPU    0: hi:  186, btch:  31 usd: 115
[  934.195459] CPU    1: hi:  186, btch:  31 usd: 146
[  934.200251] Active_anon:43024 active_file:1381 inactive_anon:46959
[  934.200252]  inactive_file:2292 unevictable:4 dirty:0 writeback:0 unstable:0
[  934.200253]  free:1170 slab:13755 mapped:4121 pagetables:7012 bounce:0
[  934.219958] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5532kB inactive_anon:5756kB active_file:16kB inactive_file:248kB unevictable:0kB present:15164kB pages_scanned:18348 all_unreclaimable? yes
[  934.239142] lowmem_reserve[]: 0 483 483 483
[  934.243408] Node 0 DMA32 free:2680kB min:2768kB low:3460kB high:4152kB active_anon:166564kB inactive_anon:182080kB active_file:5508kB inactive_file:8920kB unevictable:16kB present:495008kB pages_scanned:689667 all_unreclaimable? yes
[  934.263988] lowmem_reserve[]: 0 0 0 0
[  934.267735] Node 0 DMA: 60*4kB 0*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[  934.278662] Node 0 DMA32: 294*4kB 2*8kB 9*16kB 10*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2680kB
[  934.289834] 62846 total pagecache pages
[  934.293669] 7202 pages in swap cache
[  934.297244] Swap cache stats: add 322861, delete 315659, find 35288/156117
[  934.304107] Free swap  = 758748kB
[  934.307422] Total swap = 1048568kB
[  934.314470] 131072 pages RAM
[  934.317362] 9628 pages reserved
[  934.320501] 76930 pages shared
[  934.323549] 57149 pages non-shared
[  934.326955] Out of memory: kill process 3487 (run-many-x-apps) score 1006662 or a child
[  934.334948] Killed process 3952 (gnome-dictionar)
[  934.340708] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  934.348622] Pid: 3353, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #312
[  934.355318] Call Trace:
[  934.357768]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  934.363256]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  934.368998]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  934.372992]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  934.372992]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  934.385506]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  934.389481]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  934.397856]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  934.401848]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  934.410200]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  934.416894]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  934.421942]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  934.425936]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  934.432734]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  934.438822]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  934.444566]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  934.448558]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  934.455262] Mem-Info:
[  934.457533] Node 0 DMA per-cpu:
[  934.460695] CPU    0: hi:    0, btch:   1 usd:   0
[  934.464690] CPU    1: hi:    0, btch:   1 usd:   0
[  934.470263] Node 0 DMA32 per-cpu:
[  934.473589] CPU    0: hi:  186, btch:  31 usd: 172
[  934.478377] CPU    1: hi:  186, btch:  31 usd: 145
[  934.482373] Active_anon:42768 active_file:1390 inactive_anon:46967
[  934.482373]  inactive_file:2301 unevictable:4 dirty:0 writeback:0 unstable:0
[  934.482373]  free:1495 slab:13778 mapped:4137 pagetables:6916 bounce:0
[  934.502869] Node 0 DMA free:2060kB min:84kB low:104kB high:124kB active_anon:5492kB inactive_anon:5788kB active_file:28kB inactive_file:252kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  934.521612] lowmem_reserve[]: 0 483 483 483
[  934.525885] Node 0 DMA32 free:3920kB min:2768kB low:3460kB high:4152kB active_anon:165580kB inactive_anon:182080kB active_file:5532kB inactive_file:8952kB unevictable:16kB present:495008kB pages_scanned:0 all_unreclaimable? no
[  934.545927] lowmem_reserve[]: 0 0 0 0
[  934.549677] Node 0 DMA: 71*4kB 2*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
[  934.560588] Node 0 DMA32: 588*4kB 10*8kB 9*16kB 10*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 3920kB
[  934.568475] 62739 total pagecache pages
[  934.575685] 7086 pages in swap cache
[  934.579254] Swap cache stats: add 322861, delete 315775, find 35288/156117
[  934.586118] Free swap  = 763384kB
[  934.589433] Total swap = 1048568kB
[  934.597155] 131072 pages RAM
[  934.600036] 9628 pages reserved
[  934.600235] 76640 pages shared
[  934.606236] 56884 pages non-shared
[  934.609634] Out of memory: kill process 3487 (run-many-x-apps) score 978701 or a child
[  934.617540] Killed process 4014 (sol)
[ 1028.279307] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 1028.286714] Pid: 5554, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #312
[ 1028.293414] Call Trace:
[ 1028.295874]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 1028.301361]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 1028.307109]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 1028.312330]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 1028.318069]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 1028.323554]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 1028.329900]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 1028.335899]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 1028.341639]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 1028.348247]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 1028.354935]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 1028.359982]  [<ffffffff810cacb3>] ondemand_readahead+0x163/0x2d0
[ 1028.365986]  [<ffffffff810caf25>] page_cache_sync_readahead+0x25/0x30
[ 1028.372422]  [<ffffffff810c141c>] filemap_fault+0x37c/0x400
[ 1028.377985]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 1028.383205]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 1028.389291]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 1028.395031]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 1028.400594]  [<ffffffff81545a95>] page_fault+0x25/0x30
[ 1028.405726] Mem-Info:
[ 1028.408001] Node 0 DMA per-cpu:
[ 1028.411161] CPU    0: hi:    0, btch:   1 usd:   0
[ 1028.416012] CPU    1: hi:    0, btch:   1 usd:   0
[ 1028.420860] Node 0 DMA32 per-cpu:
[ 1028.424346] CPU    0: hi:  186, btch:  31 usd: 125
[ 1028.429129] CPU    1: hi:  186, btch:  31 usd:  17
[ 1028.433914] Active_anon:41222 active_file:1015 inactive_anon:47978
[ 1028.433915]  inactive_file:4149 unevictable:4 dirty:0 writeback:0 unstable:0
[ 1028.433916]  free:1168 slab:13459 mapped:4432 pagetables:6766 bounce:0
[ 1028.453622] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5520kB inactive_anon:5776kB active_file:0kB inactive_file:84kB unevictable:0kB present:15164kB pages_scanned:16704 all_unreclaimable? no
[ 1028.472548] lowmem_reserve[]: 0 483 483 483
[ 1028.476811] Node 0 DMA32 free:2672kB min:2768kB low:3460kB high:4152kB active_anon:159368kB inactive_anon:186136kB active_file:4060kB inactive_file:16512kB unevictable:16kB present:495008kB pages_scanned:566633 all_unreclaimable? yes
[ 1028.497459] lowmem_reserve[]: 0 0 0 0
[ 1028.501203] Node 0 DMA: 56*4kB 0*8kB 11*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[ 1028.512136] Node 0 DMA32: 278*4kB 3*8kB 4*16kB 8*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2672kB
[ 1028.523222] 64013 total pagecache pages
[ 1028.527049] 6900 pages in swap cache
[ 1028.530627] Swap cache stats: add 334539, delete 327639, find 36253/163064
[ 1028.537490] Free swap  = 775384kB
[ 1028.540803] Total swap = 1048568kB
[ 1028.547522] 131072 pages RAM
[ 1028.550399] 9628 pages reserved
[ 1028.553550] 79539 pages shared
[ 1028.556607] 57450 pages non-shared
[ 1028.560008] Out of memory: kill process 3487 (run-many-x-apps) score 938661 or a child
[ 1028.567914] Killed process 4046 (gnometris)
[ 1162.209886] Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[ 1162.216441] Pid: 3272, comm: Xorg Not tainted 2.6.30-rc8-mm1 #312
[ 1162.222536] Call Trace:
[ 1162.224993]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 1162.230485]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 1162.236231]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 1162.241461]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 1162.247198]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 1162.252677]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 1162.259027]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 1162.265027]  [<ffffffff810c7409>] __get_free_pages+0x9/0x50
[ 1162.270599]  [<ffffffff8110e402>] __pollwait+0xc2/0x100
[ 1162.275815]  [<ffffffff81495903>] unix_poll+0x23/0xc0
[ 1162.280860]  [<ffffffff81419ac8>] sock_poll+0x18/0x20
[ 1162.285907]  [<ffffffff8110d9a9>] do_select+0x3e9/0x730
[ 1162.291129]  [<ffffffff8110d5c0>] ? do_select+0x0/0x730
[ 1162.296349]  [<ffffffff8110e340>] ? __pollwait+0x0/0x100
[ 1162.301659]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.306706]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.311748]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.316792]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.321840]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.326886]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.331933]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.336979]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.342029]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.347071]  [<ffffffff8110deef>] core_sys_select+0x1ff/0x330
[ 1162.352807]  [<ffffffff8110dd38>] ? core_sys_select+0x48/0x330
[ 1162.358644]  [<ffffffffa014954c>] ? i915_gem_throttle_ioctl+0x4c/0x60 [i915]
[ 1162.365687]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[ 1162.371511]  [<ffffffff810706cc>] ? getnstimeofday+0x5c/0xf0
[ 1162.377161]  [<ffffffff8106acb9>] ? ktime_get_ts+0x59/0x60
[ 1162.382641]  [<ffffffff8110e27a>] sys_select+0x4a/0x110
[ 1162.387863]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 1162.393865] Mem-Info:
[ 1162.396132] Node 0 DMA per-cpu:
[ 1162.399294] CPU    0: hi:    0, btch:   1 usd:   0
[ 1162.404076] CPU    1: hi:    0, btch:   1 usd:   0
[ 1162.408858] Node 0 DMA32 per-cpu:
[ 1162.412185] CPU    0: hi:  186, btch:  31 usd: 161
[ 1162.416972] CPU    1: hi:  186, btch:  31 usd: 182
[ 1162.421762] Active_anon:42731 active_file:740 inactive_anon:48110
[ 1162.421763]  inactive_file:2851 unevictable:4 dirty:0 writeback:0 unstable:0
[ 1162.421764]  free:1174 slab:13321 mapped:3702 pagetables:6595 bounce:0
[ 1162.441384] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5552kB inactive_anon:5812kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:9376 all_unreclaimable? no
[ 1162.460128] lowmem_reserve[]: 0 483 483 483
[ 1162.464392] Node 0 DMA32 free:2688kB min:2768kB low:3460kB high:4152kB active_anon:165372kB inactive_anon:186628kB active_file:2960kB inactive_file:11404kB unevictable:16kB present:495008kB pages_scanned:675382 all_unreclaimable? yes
[ 1162.485048] lowmem_reserve[]: 0 0 0 0
[ 1162.488797] Node 0 DMA: 56*4kB 1*8kB 11*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[ 1162.499720] Node 0 DMA32: 274*4kB 3*8kB 8*16kB 7*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2688kB
[ 1162.510803] 62374 total pagecache pages
[ 1162.514635] 6690 pages in swap cache
[ 1162.518210] Swap cache stats: add 344648, delete 337958, find 37585/169560
[ 1162.525071] Free swap  = 796012kB
[ 1162.528385] Total swap = 1048568kB
[ 1162.535461] 131072 pages RAM
[ 1162.538352] 9628 pages reserved
[ 1162.541490] 73953 pages shared
[ 1162.544536] 58149 pages non-shared
[ 1162.547940] Out of memory: kill process 3487 (run-many-x-apps) score 918444 or a child
[ 1162.555846] Killed process 4079 (gnect)
[ 1162.634031] /usr/games/gnom invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 1162.641791] Pid: 4259, comm: /usr/games/gnom Not tainted 2.6.30-rc8-mm1 #312
[ 1162.648843] Call Trace:
[ 1162.651302]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 1162.656786]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 1162.662531]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 1162.667761]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 1162.673511]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 1162.678995]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 1162.685345]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 1162.691347]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 1162.697086]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 1162.703701]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 1162.710401]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 1162.715446]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[ 1162.721012]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 1162.726236]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 1162.732333]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 1162.738088]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 1162.743659]  [<ffffffff81545a95>] page_fault+0x25/0x30
[ 1162.748793] Mem-Info:
[ 1162.751069] Node 0 DMA per-cpu:
[ 1162.754231] CPU    0: hi:    0, btch:   1 usd:   0
[ 1162.759021] CPU    1: hi:    0, btch:   1 usd:   0
[ 1162.763812] Node 0 DMA32 per-cpu:
[ 1162.767147] CPU    0: hi:  186, btch:  31 usd:  90
[ 1162.771930] CPU    1: hi:  186, btch:  31 usd:  89
[ 1162.776719] Active_anon:42484 active_file:760 inactive_anon:48078
[ 1162.776721]  inactive_file:3351 unevictable:4 dirty:0 writeback:0 unstable:0
[ 1162.776722]  free:1174 slab:13329 mapped:3807 pagetables:6487 bounce:0
[ 1162.796351] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5532kB inactive_anon:5812kB active_file:4kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:1408 all_unreclaimable? no
[ 1162.815110] lowmem_reserve[]: 0 483 483 483
[ 1162.819378] Node 0 DMA32 free:2688kB min:2768kB low:3460kB high:4152kB active_anon:164404kB inactive_anon:186500kB active_file:3036kB inactive_file:13404kB unevictable:16kB present:495008kB pages_scanned:40768 all_unreclaimable? no
[ 1162.839863] lowmem_reserve[]: 0 0 0 0
[ 1162.843612] Node 0 DMA: 57*4kB 1*8kB 11*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[ 1162.854539] Node 0 DMA32: 274*4kB 4*8kB 8*16kB 7*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2696kB
[ 1162.865631] 62784 total pagecache pages
[ 1162.869465] 6595 pages in swap cache
[ 1162.873034] Swap cache stats: add 344648, delete 338053, find 37585/169561
[ 1162.879901] Free swap  = 802992kB
[ 1162.883222] Total swap = 1048568kB
[ 1162.891314] 131072 pages RAM
[ 1162.894216] 9628 pages reserved
[ 1162.897365] 74036 pages shared
[ 1162.900414] 58276 pages non-shared
[ 1162.903825] Out of memory: kill process 3487 (run-many-x-apps) score 890891 or a child
[ 1162.911747] Killed process 4113 (gtali)


Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-18  9:19                                 ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-18  9:19 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > (even worse this time):
> > > 
> > > Thanks for doing the tests.  Do you know if the time difference comes
> > > from IO or CPU time?
> > > 
> > > Because one reason I could think of is that the original code walks
> > > the readaround window in two directions, starting from the target each
> > > time but immediately stops when it encounters a hole where the new
> > > code just skips holes but doesn't abort readaround and thus might
> > > indeed read more slots.
> > > 
> > > I have an old patch flying around that changed the physical ra code to
> > > use a bitmap that is able to represent holes.  If the increased time
> > > is waiting for IO, I would be interested if that patch has the same
> > > negative impact.
> > 
> > You can send me the patch :)
> 
> Okay, attached is a rebase against latest -mmotm.
> 
> > But for this patch it is IO bound. The CPU iowait field actually is
> > going up as the test goes on:
> 
> It's probably the larger ra window then which takes away the bandwidth
> needed to load the new executables.  This sucks.  Would be nice to
> have 'optional IO' for readahead that is dropped when normal-priority
> IO requests are coming in...  Oh, we have READA for bios.  But it
> doesn't seem to implement dropping requests on load (or I am blind).

Hi Hannes,

Sorry for the long delay! A bad news is that I get many oom with this patch:

[  781.450862] Xorg invoked oom-killer: gfp_mask=0xd2, order=0, oom_adj=0
[  781.457411] Pid: 3272, comm: Xorg Not tainted 2.6.30-rc8-mm1 #312
[  781.463511] Call Trace:
[  781.465976]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  781.471462]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  781.477210]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  781.482449]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  781.488188]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  781.493666]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  781.500015]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[  781.505846]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  781.511857]  [<ffffffff810e7fe8>] __vmalloc_area_node+0xf8/0x190
[  781.517869]  [<ffffffffa014c9b5>] ? i915_gem_execbuffer+0xb45/0x12f0 [i915]
[  781.524835]  [<ffffffff810e8121>] __vmalloc_node+0xa1/0xb0
[  781.530346]  [<ffffffffa014c9b5>] ? i915_gem_execbuffer+0xb45/0x12f0 [i915]
[  781.537312]  [<ffffffffa014bf2b>] ? i915_gem_execbuffer+0xbb/0x12f0 [i915]
[  781.544192]  [<ffffffff810e8281>] vmalloc+0x21/0x30
[  781.549100]  [<ffffffffa014c9b5>] i915_gem_execbuffer+0xb45/0x12f0 [i915]
[  781.555920]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[  781.561789]  [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[  781.567569]  [<ffffffffa014be70>] ? i915_gem_execbuffer+0x0/0x12f0 [i915]
[  781.574383]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[  781.580225]  [<ffffffff8110babd>] vfs_ioctl+0x7d/0xa0
[  781.585287]  [<ffffffff8110bb6a>] do_vfs_ioctl+0x8a/0x580
[  781.590706]  [<ffffffff81078f3a>] ? lockdep_sys_exit+0x2a/0x90
[  781.596552]  [<ffffffff81544b34>] ? lockdep_sys_exit_thunk+0x35/0x67
[  781.602929]  [<ffffffff8110c0aa>] sys_ioctl+0x4a/0x80
[  781.607995]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  781.614005] Mem-Info:
[  781.616293] Node 0 DMA per-cpu:
[  781.619471] CPU    0: hi:    0, btch:   1 usd:   0
[  781.624278] CPU    1: hi:    0, btch:   1 usd:   0
[  781.629080] Node 0 DMA32 per-cpu:
[  781.632443] CPU    0: hi:  186, btch:  31 usd:  83
[  781.637243] CPU    1: hi:  186, btch:  31 usd: 108
[  781.642045] Active_anon:41057 active_file:2334 inactive_anon:47003
[  781.642048]  inactive_file:2148 unevictable:4 dirty:0 writeback:0 unstable:0
[  781.642051]  free:1180 slab:14177 mapped:4473 pagetables:7629 bounce:0
[  781.661802] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5408kB inactive_anon:5676kB active_file:16kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:42276 all_unreclaimable? no
[  781.680773] lowmem_reserve[]: 0 483 483 483
[  781.685089] Node 0 DMA32 free:2704kB min:2768kB low:3460kB high:4152kB active_anon:158820kB inactive_anon:182224kB active_file:9320kB inactive_file:8592kB unevictable:16kB present:495008kB pages_scanned:673623 all_unreclaimable? yes
[  781.705711] lowmem_reserve[]: 0 0 0 0
[  781.709501] Node 0 DMA: 104*4kB 0*8kB 6*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  781.720553] Node 0 DMA32: 318*4kB 1*8kB 1*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2704kB
[  781.731764] 61569 total pagecache pages
[  781.735618] 6489 pages in swap cache
[  781.739212] Swap cache stats: add 285146, delete 278657, find 31455/133061
[  781.746092] Free swap  = 709316kB
[  781.749417] Total swap = 1048568kB
[  781.759726] 131072 pages RAM
[  781.762645] 9628 pages reserved
[  781.765793] 95620 pages shared
[  781.768862] 58466 pages non-shared
[  781.772278] Out of memory: kill process 3487 (run-many-x-apps) score 1471069 or a child
[  781.780291] Killed process 3488 (xeyes)
[  781.830240] gtali invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  781.837208] Pid: 4113, comm: gtali Not tainted 2.6.30-rc8-mm1 #312
[  781.843554] Call Trace:
[  781.846233]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  781.851870]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  781.857615]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  781.862840]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  781.868578]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  781.874054]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  781.880401]  [<ffffffff810f3fb6>] alloc_page_vma+0x86/0x1c0
[  781.885969]  [<ffffffff810e9d08>] read_swap_cache_async+0xd8/0x120
[  781.892147]  [<ffffffff810e9f05>] swapin_readahead+0xb5/0x110
[  781.897886]  [<ffffffff810dac73>] do_swap_page+0x403/0x510
[  781.903366]  [<ffffffff810e9933>] ? lookup_swap_cache+0x13/0x30
[  781.909279]  [<ffffffff810da8ea>] ? do_swap_page+0x7a/0x510
[  781.914850]  [<ffffffff810dc72e>] handle_mm_fault+0x44e/0x500
[  781.920587]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  781.926149]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  781.931287] Mem-Info:
[  781.933559] Node 0 DMA per-cpu:
[  781.936714] CPU    0: hi:    0, btch:   1 usd:   0
[  781.941500] CPU    1: hi:    0, btch:   1 usd:   0
[  781.946288] Node 0 DMA32 per-cpu:
[  781.949615] CPU    0: hi:  186, btch:  31 usd:  84
[  781.954402] CPU    1: hi:  186, btch:  31 usd: 109
[  781.959192] Active_anon:41029 active_file:2334 inactive_anon:46908
[  781.959193]  inactive_file:2211 unevictable:4 dirty:0 writeback:0 unstable:0
[  781.959194]  free:1180 slab:14177 mapped:4492 pagetables:7608 bounce:0
[  781.978897] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5296kB inactive_anon:5408kB active_file:16kB inactive_file:176kB unevictable:0kB present:15164kB pages_scanned:6816 all_unreclaimable? no
[  781.997900] lowmem_reserve[]: 0 483 483 483
[  782.002173] Node 0 DMA32 free:2704kB min:2768kB low:3460kB high:4152kB active_anon:158820kB inactive_anon:182224kB active_file:9320kB inactive_file:8668kB unevictable:16kB present:495008kB pages_scanned:674199 all_unreclaimable? yes
[  782.022740] lowmem_reserve[]: 0 0 0 0
[  782.026488] Node 0 DMA: 82*4kB 9*8kB 7*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  782.037309] Node 0 DMA32: 318*4kB 1*8kB 1*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2704kB
[  782.048405] 61637 total pagecache pages
[  782.052236] 6494 pages in swap cache
[  782.055809] Swap cache stats: add 285154, delete 278660, find 31456/133069
[  782.062672] Free swap  = 709592kB
[  782.065983] Total swap = 1048568kB
[  782.072735] 131072 pages RAM
[  782.075632] 9628 pages reserved
[  782.078774] 95669 pages shared
[  782.081822] 58413 pages non-shared
[  782.085223] Out of memory: kill process 3487 (run-many-x-apps) score 1466556 or a child
[  782.093215] Killed process 3566 (gthumb)
[  790.063897] gnome-panel invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  790.071664] Pid: 3405, comm: gnome-panel Not tainted 2.6.30-rc8-mm1 #312
[  790.078421] Call Trace:
[  790.080902]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  790.086410]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  790.092159]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  790.097387]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  790.103135]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  790.108632]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  790.115001]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  790.121002]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  790.126745]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  790.133352]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  790.140057]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  790.145103]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  790.150678]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  790.155902]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  790.161989]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  790.167738]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  790.173304]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  790.178441] Mem-Info:
[  790.180714] Node 0 DMA per-cpu:
[  790.183870] CPU    0: hi:    0, btch:   1 usd:   0
[  790.188659] CPU    1: hi:    0, btch:   1 usd:   0
[  790.193446] Node 0 DMA32 per-cpu:
[  790.196783] CPU    0: hi:  186, btch:  31 usd:  43
[  790.201569] CPU    1: hi:  186, btch:  31 usd:  31
[  790.206359] Active_anon:41179 active_file:900 inactive_anon:46967
[  790.206360]  inactive_file:4104 unevictable:4 dirty:0 writeback:0 unstable:0
[  790.206361]  free:1165 slab:13961 mapped:3241 pagetables:7475 bounce:0
[  790.225984] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5496kB inactive_anon:5800kB active_file:4kB inactive_file:220kB unevictable:0kB present:15164kB pages_scanned:26112 all_unreclaimable? yes
[  790.245079] lowmem_reserve[]: 0 483 483 483
[  790.249352] Node 0 DMA32 free:2648kB min:2768kB low:3460kB high:4152kB active_anon:159220kB inactive_anon:182068kB active_file:3596kB inactive_file:16196kB unevictable:16kB present:495008kB pages_scanned:875456 all_unreclaimable? yes
[  790.270005] lowmem_reserve[]: 0 0 0 0
[  790.273762] Node 0 DMA: 53*4kB 9*8kB 12*16kB 2*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  790.284681] Node 0 DMA32: 190*4kB 46*8kB 7*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2648kB
[  790.295866] 62097 total pagecache pages
[  790.299698] 6548 pages in swap cache
[  790.303271] Swap cache stats: add 286032, delete 279484, find 31565/133879
[  790.310137] Free swap  = 717460kB
[  790.313445] Total swap = 1048568kB
[  790.320544] 131072 pages RAM
[  790.323445] 9628 pages reserved
[  790.326591] 85371 pages shared
[  790.329641] 59742 pages non-shared
[  790.333046] Out of memory: kill process 3487 (run-many-x-apps) score 1258333 or a child
[  790.341039] Killed process 3599 (gedit)
[  790.382081] gedit used greatest stack depth: 2064 bytes left
[  792.149572] Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[  792.156786] Pid: 3272, comm: Xorg Not tainted 2.6.30-rc8-mm1 #312
[  792.162980] Call Trace:
[  792.165429]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  792.170937]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  792.176691]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  792.181909]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  792.187653]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  792.193136]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  792.199490]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  792.205491]  [<ffffffff810c7409>] __get_free_pages+0x9/0x50
[  792.211060]  [<ffffffff8110e402>] __pollwait+0xc2/0x100
[  792.216283]  [<ffffffff81495903>] unix_poll+0x23/0xc0
[  792.221330]  [<ffffffff81419ac8>] sock_poll+0x18/0x20
[  792.226380]  [<ffffffff8110d9a9>] do_select+0x3e9/0x730
[  792.231597]  [<ffffffff8110d5c0>] ? do_select+0x0/0x730
[  792.236816]  [<ffffffff8110e340>] ? __pollwait+0x0/0x100
[  792.242126]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.247180]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.252227]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.257275]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.262331]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.267377]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.272422]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.277468]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.282519]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[  792.287574]  [<ffffffff8110deef>] core_sys_select+0x1ff/0x330
[  792.293317]  [<ffffffff8110dd38>] ? core_sys_select+0x48/0x330
[  792.299162]  [<ffffffffa014954c>] ? i915_gem_throttle_ioctl+0x4c/0x60 [i915]
[  792.306204]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[  792.312034]  [<ffffffff810706cc>] ? getnstimeofday+0x5c/0xf0
[  792.317687]  [<ffffffff8106acb9>] ? ktime_get_ts+0x59/0x60
[  792.323169]  [<ffffffff8110e27a>] sys_select+0x4a/0x110
[  792.328387]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[  792.334389] Mem-Info:
[  792.336663] Node 0 DMA per-cpu:
[  792.339824] CPU    0: hi:    0, btch:   1 usd:   0
[  792.344612] CPU    1: hi:    0, btch:   1 usd:   0
[  792.349397] Node 0 DMA32 per-cpu:
[  792.352734] CPU    0: hi:  186, btch:  31 usd:  57
[  792.357518] CPU    1: hi:  186, btch:  31 usd:  50
[  792.362310] Active_anon:40862 active_file:1622 inactive_anon:47020
[  792.362311]  inactive_file:3746 unevictable:4 dirty:0 writeback:0 unstable:0
[  792.362313]  free:1187 slab:13902 mapped:4052 pagetables:7387 bounce:0
[  792.382030] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5428kB inactive_anon:5680kB active_file:0kB inactive_file:224kB unevictable:0kB present:15164kB pages_scanned:4992 all_unreclaimable? no
[  792.400957] lowmem_reserve[]: 0 483 483 483
[  792.405232] Node 0 DMA32 free:2736kB min:2768kB low:3460kB high:4152kB active_anon:158020kB inactive_anon:182284kB active_file:6488kB inactive_file:14760kB unevictable:16kB present:495008kB pages_scanned:876741 all_unreclaimable? yes
[  792.425889] lowmem_reserve[]: 0 0 0 0
[  792.429637] Node 0 DMA: 31*4kB 14*8kB 15*16kB 2*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[  792.440651] Node 0 DMA32: 86*4kB 95*8kB 14*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2736kB
[  792.451821] 62288 total pagecache pages
[  792.455655] 6442 pages in swap cache
[  792.459230] Swap cache stats: add 286223, delete 279781, find 31574/134040
[  792.466100] Free swap  = 723520kB
[  792.469405] Total swap = 1048568kB
[  792.476461] 131072 pages RAM
[  792.479359] 9628 pages reserved
[  792.482502] 86274 pages shared
[  792.485547] 59031 pages non-shared
[  792.488956] Out of memory: kill process 3487 (run-many-x-apps) score 1235901 or a child
[  792.496952] Killed process 3626 (xpdf.bin)
[  912.097890] gnome-control-c invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  912.105967] Pid: 5395, comm: gnome-control-c Not tainted 2.6.30-rc8-mm1 #312
[  912.113042] Call Trace:
[  912.115499]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  912.120994]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  912.126737]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  912.131961]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  912.137709]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  912.143193]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  912.149547]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  912.155551]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  912.161295]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  912.167904]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  912.174602]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  912.179650]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  912.185221]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  912.190445]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  912.196539]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  912.202278]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  912.207840]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  912.212976] Mem-Info:
[  912.215247] Node 0 DMA per-cpu:
[  912.218402] CPU    0: hi:    0, btch:   1 usd:   0
[  912.223190] CPU    1: hi:    0, btch:   1 usd:   0
[  912.227979] Node 0 DMA32 per-cpu:
[  912.231315] CPU    0: hi:  186, btch:  31 usd: 118
[  912.236100] CPU    1: hi:  186, btch:  31 usd: 158
[  912.240891] Active_anon:42350 active_file:809 inactive_anon:47098
[  912.240892]  inactive_file:2682 unevictable:4 dirty:0 writeback:3 unstable:0
[  912.240893]  free:1164 slab:13886 mapped:3078 pagetables:7561 bounce:0
[  912.260546] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5456kB inactive_anon:5676kB active_file:4kB inactive_file:72kB unevictable:0kB present:15164kB pages_scanned:1920 all_unreclaimable? no
[  912.279403] lowmem_reserve[]: 0 483 483 483
[  912.283671] Node 0 DMA32 free:2600kB min:2768kB low:3460kB high:4152kB active_anon:163944kB inactive_anon:182600kB active_file:3232kB inactive_file:10644kB unevictable:16kB present:495008kB pages_scanned:571360 all_unreclaimable? yes
[  912.304335] lowmem_reserve[]: 0 0 0 0
[  912.308082] Node 0 DMA: 22*4kB 16*8kB 12*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[  912.319093] Node 0 DMA32: 128*4kB 131*8kB 1*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2600kB
[  912.330367] 62393 total pagecache pages
[  912.334201] 7186 pages in swap cache
[  912.337778] Swap cache stats: add 320003, delete 312817, find 34852/153688
[  912.344648] Free swap  = 714408kB
[  912.347950] Total swap = 1048568kB
[  912.355114] 131072 pages RAM
[  912.358011] 9628 pages reserved
[  912.361153] 84608 pages shared
[  912.364199] 58138 pages non-shared
[  912.367606] Out of memory: kill process 3487 (run-many-x-apps) score 1281073 or a child
[  912.375604] Killed process 3669 (xterm)
[  912.427936] tty_ldisc_deref: no references.
[  912.480847] nautilus invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  912.487981] Pid: 3408, comm: nautilus Not tainted 2.6.30-rc8-mm1 #312
[  912.494418] Call Trace:
[  912.496876]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  912.502361]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  912.508100]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  912.513327]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  912.519067]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  912.524552]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  912.530902]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  912.536907]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  912.542645]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  912.549253]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  912.555946]  [<ffffffff810a9c9b>] ? delayacct_end+0x6b/0xa0
[  912.561517]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  912.566563]  [<ffffffff810cacb3>] ondemand_readahead+0x163/0x2d0
[  912.572563]  [<ffffffff810caf25>] page_cache_sync_readahead+0x25/0x30
[  912.579000]  [<ffffffff810c141c>] filemap_fault+0x37c/0x400
[  912.584576]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  912.589799]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  912.595888]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  912.601632]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  912.607206]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  912.612345] Mem-Info:
[  912.614624] Node 0 DMA per-cpu:
[  912.617787] CPU    0: hi:    0, btch:   1 usd:   0
[  912.622570] CPU    1: hi:    0, btch:   1 usd:   0
[  912.627353] Node 0 DMA32 per-cpu:
[  912.630682] CPU    0: hi:  186, btch:  31 usd: 121
[  912.635470] CPU    1: hi:  186, btch:  31 usd:  76
[  912.640259] Active_anon:42310 active_file:830 inactive_anon:47085
[  912.640260]  inactive_file:2747 unevictable:4 dirty:0 writeback:0 unstable:0
[  912.640261]  free:1182 slab:13881 mapped:3111 pagetables:7523 bounce:0
[  912.659881] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5468kB inactive_anon:5784kB active_file:4kB inactive_file:56kB unevictable:0kB present:15164kB pages_scanned:5152 all_unreclaimable? no
[  912.678724] lowmem_reserve[]: 0 483 483 483
[  912.682990] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:163772kB inactive_anon:182556kB active_file:3316kB inactive_file:10932kB unevictable:16kB present:495008kB pages_scanned:51712 all_unreclaimable? no
[  912.703478] lowmem_reserve[]: 0 0 0 0
[  912.707226] Node 0 DMA: 21*4kB 16*8kB 12*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2004kB
[  912.718239] Node 0 DMA32: 159*4kB 132*8kB 1*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2732kB
[  912.729502] 62461 total pagecache pages
[  912.733337] 7171 pages in swap cache
[  912.736915] Swap cache stats: add 320011, delete 312840, find 34852/153696
[  912.743782] Free swap  = 715668kB
[  912.747098] Total swap = 1048568kB
[  912.754168] 131072 pages RAM
[  912.757059] 9628 pages reserved
[  912.760191] 84519 pages shared
[  912.763248] 58139 pages non-shared
[  912.766653] Out of memory: kill process 3487 (run-many-x-apps) score 1273781 or a child
[  912.774647] Killed process 3762 (gnome-terminal)
[  913.650490] tty_ldisc_deref: no references.
[  914.671325] kerneloops-appl invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  914.679083] Pid: 3425, comm: kerneloops-appl Not tainted 2.6.30-rc8-mm1 #312
[  914.686121] Call Trace:
[  914.688575]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  914.694057]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  914.699800]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  914.705034]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  914.710791]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  914.716279]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  914.722640]  [<ffffffff810f3fb6>] alloc_page_vma+0x86/0x1c0
[  914.728208]  [<ffffffff810e9d08>] read_swap_cache_async+0xd8/0x120
[  914.734391]  [<ffffffff810e9f05>] swapin_readahead+0xb5/0x110
[  914.740139]  [<ffffffff810dac73>] do_swap_page+0x403/0x510
[  914.745632]  [<ffffffff810c0710>] ? find_get_page+0x0/0x110
[  914.751200]  [<ffffffff810e9933>] ? lookup_swap_cache+0x13/0x30
[  914.757115]  [<ffffffff810da8ea>] ? do_swap_page+0x7a/0x510
[  914.762688]  [<ffffffff810dc72e>] handle_mm_fault+0x44e/0x500
[  914.768437]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  914.774005]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  914.779136] Mem-Info:
[  914.781410] Node 0 DMA per-cpu:
[  914.784572] CPU    0: hi:    0, btch:   1 usd:   0
[  914.789367] CPU    1: hi:    0, btch:   1 usd:   0
[  914.794156] Node 0 DMA32 per-cpu:
[  914.797493] CPU    0: hi:  186, btch:  31 usd: 150
[  914.802278] CPU    1: hi:  186, btch:  31 usd: 147
[  914.807064] Active_anon:42324 active_file:1285 inactive_anon:47097
[  914.807065]  inactive_file:2225 unevictable:4 dirty:0 writeback:0 unstable:0
[  914.807067]  free:1185 slab:13908 mapped:3648 pagetables:7413 bounce:0
[  914.826781] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5360kB inactive_anon:5784kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:17408 all_unreclaimable? yes
[  914.845718] lowmem_reserve[]: 0 483 483 483
[  914.849988] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:163936kB inactive_anon:182604kB active_file:5140kB inactive_file:8908kB unevictable:16kB present:495008kB pages_scanned:581760 all_unreclaimable? yes
[  914.870559] lowmem_reserve[]: 0 0 0 0
[  914.874306] Node 0 DMA: 37*4kB 10*8kB 12*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[  914.885318] Node 0 DMA32: 119*4kB 139*8kB 7*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2724kB
[  914.896588] 62441 total pagecache pages
[  914.900417] 7199 pages in swap cache
[  914.903999] Swap cache stats: add 320272, delete 313073, find 34864/153895
[  914.910867] Free swap  = 721224kB
[  914.914193] Total swap = 1048568kB
[  914.921489] 131072 pages RAM
[  914.924370] 9628 pages reserved
[  914.927519] 84507 pages shared
[  914.930581] 57535 pages non-shared
[  914.933989] Out of memory: kill process 3487 (run-many-x-apps) score 1213315 or a child
[  914.941986] Killed process 3803 (urxvt)
[  914.947298] tty_ldisc_deref: no references.
[  919.983335] gnome-keyboard- invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[  919.991145] Pid: 5458, comm: gnome-keyboard- Not tainted 2.6.30-rc8-mm1 #312
[  919.998198] Call Trace:
[  920.000663]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  920.006157]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  920.011906]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  920.017135]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  920.022876]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  920.028357]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  920.034706]  [<ffffffff810f3fb6>] alloc_page_vma+0x86/0x1c0
[  920.040280]  [<ffffffff810e9d08>] read_swap_cache_async+0xd8/0x120
[  920.046460]  [<ffffffff810e9f05>] swapin_readahead+0xb5/0x110
[  920.052196]  [<ffffffff810dac73>] do_swap_page+0x403/0x510
[  920.057676]  [<ffffffff810e9933>] ? lookup_swap_cache+0x13/0x30
[  920.063592]  [<ffffffff810da8ea>] ? do_swap_page+0x7a/0x510
[  920.069165]  [<ffffffff810dc72e>] handle_mm_fault+0x44e/0x500
[  920.074901]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  920.080470]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  920.085604] Mem-Info:
[  920.087875] Node 0 DMA per-cpu:
[  920.091031] CPU    0: hi:    0, btch:   1 usd:   0
[  920.095818] CPU    1: hi:    0, btch:   1 usd:   0
[  920.100617] Node 0 DMA32 per-cpu:
[  920.103947] CPU    0: hi:  186, btch:  31 usd:  89
[  920.108734] CPU    1: hi:  186, btch:  31 usd: 119
[  920.113524] Active_anon:42944 active_file:542 inactive_anon:46956
[  920.113525]  inactive_file:2652 unevictable:4 dirty:0 writeback:0 unstable:0
[  920.113526]  free:1169 slab:13893 mapped:3036 pagetables:7342 bounce:0
[  920.133149] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5568kB inactive_anon:5772kB active_file:20kB inactive_file:164kB unevictable:0kB present:15164kB pages_scanned:22824 all_unreclaimable? yes
[  920.152324] lowmem_reserve[]: 0 483 483 483
[  920.156597] Node 0 DMA32 free:2668kB min:2768kB low:3460kB high:4152kB active_anon:166208kB inactive_anon:182052kB active_file:2148kB inactive_file:10444kB unevictable:16kB present:495008kB pages_scanned:650400 all_unreclaimable? yes
[  920.177245] lowmem_reserve[]: 0 0 0 0
[  920.180991] Node 0 DMA: 44*4kB 9*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[  920.191903] Node 0 DMA32: 165*4kB 117*8kB 3*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2668kB
[  920.203169] 62409 total pagecache pages
[  920.207000] 7469 pages in swap cache
[  920.210572] Swap cache stats: add 321003, delete 313534, find 34989/154507
[  920.217436] Free swap  = 725812kB
[  920.220752] Total swap = 1048568kB
[  920.227856] 131072 pages RAM
[  920.230752] 9628 pages reserved
[  920.233901] 78560 pages shared
[  920.236958] 58011 pages non-shared
[  920.240355] Out of memory: kill process 3487 (run-many-x-apps) score 1195965 or a child
[  920.248346] Killed process 3889 (gnome-system-mo)
[  920.993872] nautilus invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  921.001843] Pid: 3408, comm: nautilus Not tainted 2.6.30-rc8-mm1 #312
[  921.008294] Call Trace:
[  921.010757]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  921.016245]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  921.021995]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  921.027215]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  921.032954]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  921.038441]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  921.044805]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  921.050808]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  921.056549]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  921.063163]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  921.069868]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  921.074918]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  921.080487]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  921.085717]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  921.091805]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  921.097552]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  921.103145]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  921.108280] Mem-Info:
[  921.110556] Node 0 DMA per-cpu:
[  921.113720] CPU    0: hi:    0, btch:   1 usd:   0
[  921.118501] CPU    1: hi:    0, btch:   1 usd:   0
[  921.123286] Node 0 DMA32 per-cpu:
[  921.126614] CPU    0: hi:  186, btch:  31 usd:  25
[  921.131400] CPU    1: hi:  186, btch:  31 usd:  58
[  921.136187] Active_anon:42277 active_file:992 inactive_anon:46953
[  921.136188]  inactive_file:3279 unevictable:4 dirty:0 writeback:0 unstable:0
[  921.136189]  free:1183 slab:13728 mapped:3449 pagetables:7235 bounce:0
[  921.155810] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5540kB inactive_anon:5772kB active_file:20kB inactive_file:224kB unevictable:0kB present:15164kB pages_scanned:18464 all_unreclaimable? yes
[  921.174995] lowmem_reserve[]: 0 483 483 483
[  921.179259] Node 0 DMA32 free:2716kB min:2768kB low:3460kB high:4152kB active_anon:163568kB inactive_anon:182040kB active_file:3948kB inactive_file:12892kB unevictable:16kB present:495008kB pages_scanned:719674 all_unreclaimable? yes
[  921.199914] lowmem_reserve[]: 0 0 0 0
[  921.203661] Node 0 DMA: 50*4kB 7*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[  921.214577] Node 0 DMA32: 257*4kB 45*8kB 19*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2716kB
[  921.225837] 63208 total pagecache pages
[  921.229675] 7214 pages in swap cache
[  921.233249] Swap cache stats: add 321070, delete 313856, find 34991/154562
[  921.240112] Free swap  = 730844kB
[  921.243427] Total swap = 1048568kB
[  921.250566] 131072 pages RAM
[  921.253460] 9628 pages reserved
[  921.256599] 79050 pages shared
[  921.259646] 57895 pages non-shared
[  921.263048] Out of memory: kill process 3487 (run-many-x-apps) score 1168892 or a child
[  921.271042] Killed process 3917 (gnome-help)
[  934.057490] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  934.065285] Pid: 3353, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #312
[  934.072425] Call Trace:
[  934.074882]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  934.080382]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  934.086126]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  934.091349]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  934.097091]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  934.102568]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  934.108914]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  934.114922]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  934.120667]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  934.127269]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  934.133963]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  934.139018]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  934.144593]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  934.149812]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  934.155898]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  934.161640]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  934.167208]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  934.172348] Mem-Info:
[  934.174614] Node 0 DMA per-cpu:
[  934.177775] CPU    0: hi:    0, btch:   1 usd:   0
[  934.182560] CPU    1: hi:    0, btch:   1 usd:   0
[  934.187342] Node 0 DMA32 per-cpu:
[  934.190671] CPU    0: hi:  186, btch:  31 usd: 115
[  934.195459] CPU    1: hi:  186, btch:  31 usd: 146
[  934.200251] Active_anon:43024 active_file:1381 inactive_anon:46959
[  934.200252]  inactive_file:2292 unevictable:4 dirty:0 writeback:0 unstable:0
[  934.200253]  free:1170 slab:13755 mapped:4121 pagetables:7012 bounce:0
[  934.219958] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5532kB inactive_anon:5756kB active_file:16kB inactive_file:248kB unevictable:0kB present:15164kB pages_scanned:18348 all_unreclaimable? yes
[  934.239142] lowmem_reserve[]: 0 483 483 483
[  934.243408] Node 0 DMA32 free:2680kB min:2768kB low:3460kB high:4152kB active_anon:166564kB inactive_anon:182080kB active_file:5508kB inactive_file:8920kB unevictable:16kB present:495008kB pages_scanned:689667 all_unreclaimable? yes
[  934.263988] lowmem_reserve[]: 0 0 0 0
[  934.267735] Node 0 DMA: 60*4kB 0*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[  934.278662] Node 0 DMA32: 294*4kB 2*8kB 9*16kB 10*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2680kB
[  934.289834] 62846 total pagecache pages
[  934.293669] 7202 pages in swap cache
[  934.297244] Swap cache stats: add 322861, delete 315659, find 35288/156117
[  934.304107] Free swap  = 758748kB
[  934.307422] Total swap = 1048568kB
[  934.314470] 131072 pages RAM
[  934.317362] 9628 pages reserved
[  934.320501] 76930 pages shared
[  934.323549] 57149 pages non-shared
[  934.326955] Out of memory: kill process 3487 (run-many-x-apps) score 1006662 or a child
[  934.334948] Killed process 3952 (gnome-dictionar)
[  934.340708] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[  934.348622] Pid: 3353, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #312
[  934.355318] Call Trace:
[  934.357768]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[  934.363256]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[  934.368998]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[  934.372992]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[  934.372992]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[  934.385506]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[  934.389481]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[  934.397856]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[  934.401848]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[  934.410200]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[  934.416894]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[  934.421942]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[  934.425936]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[  934.432734]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[  934.438822]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[  934.444566]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[  934.448558]  [<ffffffff81545a95>] page_fault+0x25/0x30
[  934.455262] Mem-Info:
[  934.457533] Node 0 DMA per-cpu:
[  934.460695] CPU    0: hi:    0, btch:   1 usd:   0
[  934.464690] CPU    1: hi:    0, btch:   1 usd:   0
[  934.470263] Node 0 DMA32 per-cpu:
[  934.473589] CPU    0: hi:  186, btch:  31 usd: 172
[  934.478377] CPU    1: hi:  186, btch:  31 usd: 145
[  934.482373] Active_anon:42768 active_file:1390 inactive_anon:46967
[  934.482373]  inactive_file:2301 unevictable:4 dirty:0 writeback:0 unstable:0
[  934.482373]  free:1495 slab:13778 mapped:4137 pagetables:6916 bounce:0
[  934.502869] Node 0 DMA free:2060kB min:84kB low:104kB high:124kB active_anon:5492kB inactive_anon:5788kB active_file:28kB inactive_file:252kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[  934.521612] lowmem_reserve[]: 0 483 483 483
[  934.525885] Node 0 DMA32 free:3920kB min:2768kB low:3460kB high:4152kB active_anon:165580kB inactive_anon:182080kB active_file:5532kB inactive_file:8952kB unevictable:16kB present:495008kB pages_scanned:0 all_unreclaimable? no
[  934.545927] lowmem_reserve[]: 0 0 0 0
[  934.549677] Node 0 DMA: 71*4kB 2*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
[  934.560588] Node 0 DMA32: 588*4kB 10*8kB 9*16kB 10*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 3920kB
[  934.568475] 62739 total pagecache pages
[  934.575685] 7086 pages in swap cache
[  934.579254] Swap cache stats: add 322861, delete 315775, find 35288/156117
[  934.586118] Free swap  = 763384kB
[  934.589433] Total swap = 1048568kB
[  934.597155] 131072 pages RAM
[  934.600036] 9628 pages reserved
[  934.600235] 76640 pages shared
[  934.606236] 56884 pages non-shared
[  934.609634] Out of memory: kill process 3487 (run-many-x-apps) score 978701 or a child
[  934.617540] Killed process 4014 (sol)
[ 1028.279307] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 1028.286714] Pid: 5554, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #312
[ 1028.293414] Call Trace:
[ 1028.295874]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 1028.301361]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 1028.307109]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 1028.312330]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 1028.318069]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 1028.323554]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 1028.329900]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 1028.335899]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 1028.341639]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 1028.348247]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 1028.354935]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 1028.359982]  [<ffffffff810cacb3>] ondemand_readahead+0x163/0x2d0
[ 1028.365986]  [<ffffffff810caf25>] page_cache_sync_readahead+0x25/0x30
[ 1028.372422]  [<ffffffff810c141c>] filemap_fault+0x37c/0x400
[ 1028.377985]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 1028.383205]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 1028.389291]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 1028.395031]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 1028.400594]  [<ffffffff81545a95>] page_fault+0x25/0x30
[ 1028.405726] Mem-Info:
[ 1028.408001] Node 0 DMA per-cpu:
[ 1028.411161] CPU    0: hi:    0, btch:   1 usd:   0
[ 1028.416012] CPU    1: hi:    0, btch:   1 usd:   0
[ 1028.420860] Node 0 DMA32 per-cpu:
[ 1028.424346] CPU    0: hi:  186, btch:  31 usd: 125
[ 1028.429129] CPU    1: hi:  186, btch:  31 usd:  17
[ 1028.433914] Active_anon:41222 active_file:1015 inactive_anon:47978
[ 1028.433915]  inactive_file:4149 unevictable:4 dirty:0 writeback:0 unstable:0
[ 1028.433916]  free:1168 slab:13459 mapped:4432 pagetables:6766 bounce:0
[ 1028.453622] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5520kB inactive_anon:5776kB active_file:0kB inactive_file:84kB unevictable:0kB present:15164kB pages_scanned:16704 all_unreclaimable? no
[ 1028.472548] lowmem_reserve[]: 0 483 483 483
[ 1028.476811] Node 0 DMA32 free:2672kB min:2768kB low:3460kB high:4152kB active_anon:159368kB inactive_anon:186136kB active_file:4060kB inactive_file:16512kB unevictable:16kB present:495008kB pages_scanned:566633 all_unreclaimable? yes
[ 1028.497459] lowmem_reserve[]: 0 0 0 0
[ 1028.501203] Node 0 DMA: 56*4kB 0*8kB 11*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[ 1028.512136] Node 0 DMA32: 278*4kB 3*8kB 4*16kB 8*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2672kB
[ 1028.523222] 64013 total pagecache pages
[ 1028.527049] 6900 pages in swap cache
[ 1028.530627] Swap cache stats: add 334539, delete 327639, find 36253/163064
[ 1028.537490] Free swap  = 775384kB
[ 1028.540803] Total swap = 1048568kB
[ 1028.547522] 131072 pages RAM
[ 1028.550399] 9628 pages reserved
[ 1028.553550] 79539 pages shared
[ 1028.556607] 57450 pages non-shared
[ 1028.560008] Out of memory: kill process 3487 (run-many-x-apps) score 938661 or a child
[ 1028.567914] Killed process 4046 (gnometris)
[ 1162.209886] Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[ 1162.216441] Pid: 3272, comm: Xorg Not tainted 2.6.30-rc8-mm1 #312
[ 1162.222536] Call Trace:
[ 1162.224993]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 1162.230485]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 1162.236231]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 1162.241461]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 1162.247198]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 1162.252677]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 1162.259027]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 1162.265027]  [<ffffffff810c7409>] __get_free_pages+0x9/0x50
[ 1162.270599]  [<ffffffff8110e402>] __pollwait+0xc2/0x100
[ 1162.275815]  [<ffffffff81495903>] unix_poll+0x23/0xc0
[ 1162.280860]  [<ffffffff81419ac8>] sock_poll+0x18/0x20
[ 1162.285907]  [<ffffffff8110d9a9>] do_select+0x3e9/0x730
[ 1162.291129]  [<ffffffff8110d5c0>] ? do_select+0x0/0x730
[ 1162.296349]  [<ffffffff8110e340>] ? __pollwait+0x0/0x100
[ 1162.301659]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.306706]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.311748]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.316792]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.321840]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.326886]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.331933]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.336979]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.342029]  [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.347071]  [<ffffffff8110deef>] core_sys_select+0x1ff/0x330
[ 1162.352807]  [<ffffffff8110dd38>] ? core_sys_select+0x48/0x330
[ 1162.358644]  [<ffffffffa014954c>] ? i915_gem_throttle_ioctl+0x4c/0x60 [i915]
[ 1162.365687]  [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[ 1162.371511]  [<ffffffff810706cc>] ? getnstimeofday+0x5c/0xf0
[ 1162.377161]  [<ffffffff8106acb9>] ? ktime_get_ts+0x59/0x60
[ 1162.382641]  [<ffffffff8110e27a>] sys_select+0x4a/0x110
[ 1162.387863]  [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 1162.393865] Mem-Info:
[ 1162.396132] Node 0 DMA per-cpu:
[ 1162.399294] CPU    0: hi:    0, btch:   1 usd:   0
[ 1162.404076] CPU    1: hi:    0, btch:   1 usd:   0
[ 1162.408858] Node 0 DMA32 per-cpu:
[ 1162.412185] CPU    0: hi:  186, btch:  31 usd: 161
[ 1162.416972] CPU    1: hi:  186, btch:  31 usd: 182
[ 1162.421762] Active_anon:42731 active_file:740 inactive_anon:48110
[ 1162.421763]  inactive_file:2851 unevictable:4 dirty:0 writeback:0 unstable:0
[ 1162.421764]  free:1174 slab:13321 mapped:3702 pagetables:6595 bounce:0
[ 1162.441384] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5552kB inactive_anon:5812kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:9376 all_unreclaimable? no
[ 1162.460128] lowmem_reserve[]: 0 483 483 483
[ 1162.464392] Node 0 DMA32 free:2688kB min:2768kB low:3460kB high:4152kB active_anon:165372kB inactive_anon:186628kB active_file:2960kB inactive_file:11404kB unevictable:16kB present:495008kB pages_scanned:675382 all_unreclaimable? yes
[ 1162.485048] lowmem_reserve[]: 0 0 0 0
[ 1162.488797] Node 0 DMA: 56*4kB 1*8kB 11*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[ 1162.499720] Node 0 DMA32: 274*4kB 3*8kB 8*16kB 7*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2688kB
[ 1162.510803] 62374 total pagecache pages
[ 1162.514635] 6690 pages in swap cache
[ 1162.518210] Swap cache stats: add 344648, delete 337958, find 37585/169560
[ 1162.525071] Free swap  = 796012kB
[ 1162.528385] Total swap = 1048568kB
[ 1162.535461] 131072 pages RAM
[ 1162.538352] 9628 pages reserved
[ 1162.541490] 73953 pages shared
[ 1162.544536] 58149 pages non-shared
[ 1162.547940] Out of memory: kill process 3487 (run-many-x-apps) score 918444 or a child
[ 1162.555846] Killed process 4079 (gnect)
[ 1162.634031] /usr/games/gnom invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 1162.641791] Pid: 4259, comm: /usr/games/gnom Not tainted 2.6.30-rc8-mm1 #312
[ 1162.648843] Call Trace:
[ 1162.651302]  [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 1162.656786]  [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 1162.662531]  [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 1162.667761]  [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 1162.673511]  [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 1162.678995]  [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 1162.685345]  [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 1162.691347]  [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 1162.697086]  [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 1162.703701]  [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 1162.710401]  [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 1162.715446]  [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[ 1162.721012]  [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 1162.726236]  [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 1162.732333]  [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 1162.738088]  [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 1162.743659]  [<ffffffff81545a95>] page_fault+0x25/0x30
[ 1162.748793] Mem-Info:
[ 1162.751069] Node 0 DMA per-cpu:
[ 1162.754231] CPU    0: hi:    0, btch:   1 usd:   0
[ 1162.759021] CPU    1: hi:    0, btch:   1 usd:   0
[ 1162.763812] Node 0 DMA32 per-cpu:
[ 1162.767147] CPU    0: hi:  186, btch:  31 usd:  90
[ 1162.771930] CPU    1: hi:  186, btch:  31 usd:  89
[ 1162.776719] Active_anon:42484 active_file:760 inactive_anon:48078
[ 1162.776721]  inactive_file:3351 unevictable:4 dirty:0 writeback:0 unstable:0
[ 1162.776722]  free:1174 slab:13329 mapped:3807 pagetables:6487 bounce:0
[ 1162.796351] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5532kB inactive_anon:5812kB active_file:4kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:1408 all_unreclaimable? no
[ 1162.815110] lowmem_reserve[]: 0 483 483 483
[ 1162.819378] Node 0 DMA32 free:2688kB min:2768kB low:3460kB high:4152kB active_anon:164404kB inactive_anon:186500kB active_file:3036kB inactive_file:13404kB unevictable:16kB present:495008kB pages_scanned:40768 all_unreclaimable? no
[ 1162.839863] lowmem_reserve[]: 0 0 0 0
[ 1162.843612] Node 0 DMA: 57*4kB 1*8kB 11*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[ 1162.854539] Node 0 DMA32: 274*4kB 4*8kB 8*16kB 7*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2696kB
[ 1162.865631] 62784 total pagecache pages
[ 1162.869465] 6595 pages in swap cache
[ 1162.873034] Swap cache stats: add 344648, delete 338053, find 37585/169561
[ 1162.879901] Free swap  = 802992kB
[ 1162.883222] Total swap = 1048568kB
[ 1162.891314] 131072 pages RAM
[ 1162.894216] 9628 pages reserved
[ 1162.897365] 74036 pages shared
[ 1162.900414] 58276 pages non-shared
[ 1162.903825] Out of memory: kill process 3487 (run-many-x-apps) score 890891 or a child
[ 1162.911747] Killed process 4113 (gtali)


Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-17 22:41     ` Johannes Weiner
@ 2009-06-18  9:29       ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-18  9:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

Johannes,

On Thu, Jun 18, 2009 at 06:41:49AM +0800, Johannes Weiner wrote:
> On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 9 Jun 2009 21:01:28 +0200
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > [resend with lists cc'd, sorry]
> > > 
> > > +static int swap_readahead_ptes(struct mm_struct *mm,

I suspect the previous unfavorable results are due to comparing things
with/without the drm vmalloc patch. So I spent one day redo the whole
comparisons. The swap readahead patch shows neither big improvements
nor big degradations this time.

Base kernel is 2.6.30-rc8-mm1 with drm vmalloc patch.

a) base kernel
b) base kernel + VM_EXEC protection
c) base kernel + VM_EXEC protection + swap readahead

     (a)         (b)         (c)
    0.02        0.02        0.01    N xeyes
    0.78        0.92        0.77    N firefox
    2.03        2.20        1.97    N nautilus
    3.27        3.35        3.39    N nautilus --browser
    5.10        5.28        4.99    N gthumb
    6.74        7.06        6.64    N gedit
    8.70        8.82        8.47    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
   11.05       10.95       10.94    N
   13.03       12.72       12.79    N xterm
   15.46       15.09       15.10    N mlterm
   18.05       17.31       17.51    N gnome-terminal
   20.59       19.90       19.98    N urxvt
   23.45       22.82       22.67    N
   25.74       25.16       24.96    N gnome-system-monitor
   28.87       27.53       27.89    N gnome-help
   32.37       31.17       31.89    N gnome-dictionary
   36.60       35.18       35.16    N
   39.76       38.04       37.64    N /usr/games/sol
   43.05       42.17       40.33    N /usr/games/gnometris
   47.70       47.08       43.48    N /usr/games/gnect
   51.64       50.46       47.24    N /usr/games/gtali
   56.26       54.58       50.83    N /usr/games/iagno
   60.36       58.01       55.15    N /usr/games/gnotravex
   65.79       62.92       59.28    N /usr/games/mahjongg
   71.59       67.36       65.95    N /usr/games/gnome-sudoku
   78.57       72.32       72.60    N /usr/games/glines
   84.25       80.03       77.42    N /usr/games/glchess
   90.65       88.11       83.66    N /usr/games/gnomine
   97.75       95.13       89.38    N /usr/games/gnotski
  102.99      101.59       95.05    N /usr/games/gnibbles
  110.68      112.05      109.40    N /usr/games/gnobots2
  117.23      121.58      120.05    N /usr/games/blackjack
  125.15      133.59      130.91    N /usr/games/same-gnome
  134.05      151.99      148.91    N
  142.57      162.67      165.00    N /usr/bin/gnome-window-properties
  156.29      174.54      183.84    N /usr/bin/gnome-default-applications-properties
  168.37      190.38      200.99    N /usr/bin/gnome-at-properties
  184.80      209.41      230.82    N /usr/bin/gnome-typing-monitor
  202.05      226.52      250.02    N /usr/bin/gnome-at-visual
  217.60      243.76      272.91    N /usr/bin/gnome-sound-properties
  239.78      266.47      308.74    N /usr/bin/gnome-at-mobility
  255.23      285.42      338.51    N /usr/bin/gnome-keybinding-properties
  276.85      314.84      374.64    N /usr/bin/gnome-about-me
  308.51      355.95      419.78    N /usr/bin/gnome-display-properties
  341.27      401.22      463.55    N /usr/bin/gnome-network-preferences
  393.42      451.27      517.24    N /usr/bin/gnome-mouse-properties
  438.48      510.54      574.64    N /usr/bin/gnome-appearance-properties
  616.09      671.44      760.49    N /usr/bin/gnome-control-center
  879.69      879.45      918.87    N /usr/bin/gnome-keyboard-properties
 1159.47     1076.29     1071.65    N
 1701.82     1240.47     1280.77    N : oocalc
 1921.14     1446.95     1451.82    N : oodraw
 2262.40     1572.95     1698.37    N : ooimpress
 2703.88     1714.53     1841.89    N : oomath
 3464.54     1864.99     1983.96    N : ooweb
 4040.91     2079.96     2185.53    N : oowriter
 4668.16     2330.24     2365.17    N

 Thanks,
 Fengguang


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-18  9:29       ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-18  9:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

Johannes,

On Thu, Jun 18, 2009 at 06:41:49AM +0800, Johannes Weiner wrote:
> On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 9 Jun 2009 21:01:28 +0200
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > [resend with lists cc'd, sorry]
> > > 
> > > +static int swap_readahead_ptes(struct mm_struct *mm,

I suspect the previous unfavorable results are due to comparing things
with/without the drm vmalloc patch. So I spent one day redo the whole
comparisons. The swap readahead patch shows neither big improvements
nor big degradations this time.

Base kernel is 2.6.30-rc8-mm1 with drm vmalloc patch.

a) base kernel
b) base kernel + VM_EXEC protection
c) base kernel + VM_EXEC protection + swap readahead

     (a)         (b)         (c)
    0.02        0.02        0.01    N xeyes
    0.78        0.92        0.77    N firefox
    2.03        2.20        1.97    N nautilus
    3.27        3.35        3.39    N nautilus --browser
    5.10        5.28        4.99    N gthumb
    6.74        7.06        6.64    N gedit
    8.70        8.82        8.47    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
   11.05       10.95       10.94    N
   13.03       12.72       12.79    N xterm
   15.46       15.09       15.10    N mlterm
   18.05       17.31       17.51    N gnome-terminal
   20.59       19.90       19.98    N urxvt
   23.45       22.82       22.67    N
   25.74       25.16       24.96    N gnome-system-monitor
   28.87       27.53       27.89    N gnome-help
   32.37       31.17       31.89    N gnome-dictionary
   36.60       35.18       35.16    N
   39.76       38.04       37.64    N /usr/games/sol
   43.05       42.17       40.33    N /usr/games/gnometris
   47.70       47.08       43.48    N /usr/games/gnect
   51.64       50.46       47.24    N /usr/games/gtali
   56.26       54.58       50.83    N /usr/games/iagno
   60.36       58.01       55.15    N /usr/games/gnotravex
   65.79       62.92       59.28    N /usr/games/mahjongg
   71.59       67.36       65.95    N /usr/games/gnome-sudoku
   78.57       72.32       72.60    N /usr/games/glines
   84.25       80.03       77.42    N /usr/games/glchess
   90.65       88.11       83.66    N /usr/games/gnomine
   97.75       95.13       89.38    N /usr/games/gnotski
  102.99      101.59       95.05    N /usr/games/gnibbles
  110.68      112.05      109.40    N /usr/games/gnobots2
  117.23      121.58      120.05    N /usr/games/blackjack
  125.15      133.59      130.91    N /usr/games/same-gnome
  134.05      151.99      148.91    N
  142.57      162.67      165.00    N /usr/bin/gnome-window-properties
  156.29      174.54      183.84    N /usr/bin/gnome-default-applications-properties
  168.37      190.38      200.99    N /usr/bin/gnome-at-properties
  184.80      209.41      230.82    N /usr/bin/gnome-typing-monitor
  202.05      226.52      250.02    N /usr/bin/gnome-at-visual
  217.60      243.76      272.91    N /usr/bin/gnome-sound-properties
  239.78      266.47      308.74    N /usr/bin/gnome-at-mobility
  255.23      285.42      338.51    N /usr/bin/gnome-keybinding-properties
  276.85      314.84      374.64    N /usr/bin/gnome-about-me
  308.51      355.95      419.78    N /usr/bin/gnome-display-properties
  341.27      401.22      463.55    N /usr/bin/gnome-network-preferences
  393.42      451.27      517.24    N /usr/bin/gnome-mouse-properties
  438.48      510.54      574.64    N /usr/bin/gnome-appearance-properties
  616.09      671.44      760.49    N /usr/bin/gnome-control-center
  879.69      879.45      918.87    N /usr/bin/gnome-keyboard-properties
 1159.47     1076.29     1071.65    N
 1701.82     1240.47     1280.77    N : oocalc
 1921.14     1446.95     1451.82    N : oodraw
 2262.40     1572.95     1698.37    N : ooimpress
 2703.88     1714.53     1841.89    N : oomath
 3464.54     1864.99     1983.96    N : ooweb
 4040.91     2079.96     2185.53    N : oowriter
 4668.16     2330.24     2365.17    N

 Thanks,
 Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-18  9:19                                 ` Wu Fengguang
@ 2009-06-18 13:01                                   ` Johannes Weiner
  -1 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-18 13:01 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> > On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > > (even worse this time):
> > > > 
> > > > Thanks for doing the tests.  Do you know if the time difference comes
> > > > from IO or CPU time?
> > > > 
> > > > Because one reason I could think of is that the original code walks
> > > > the readaround window in two directions, starting from the target each
> > > > time but immediately stops when it encounters a hole where the new
> > > > code just skips holes but doesn't abort readaround and thus might
> > > > indeed read more slots.
> > > > 
> > > > I have an old patch flying around that changed the physical ra code to
> > > > use a bitmap that is able to represent holes.  If the increased time
> > > > is waiting for IO, I would be interested if that patch has the same
> > > > negative impact.
> > > 
> > > You can send me the patch :)
> > 
> > Okay, attached is a rebase against latest -mmotm.
> > 
> > > But for this patch it is IO bound. The CPU iowait field actually is
> > > going up as the test goes on:
> > 
> > It's probably the larger ra window then which takes away the bandwidth
> > needed to load the new executables.  This sucks.  Would be nice to
> > have 'optional IO' for readahead that is dropped when normal-priority
> > IO requests are coming in...  Oh, we have READA for bios.  But it
> > doesn't seem to implement dropping requests on load (or I am blind).
> 
> Hi Hannes,
> 
> Sorry for the long delay! A bad news is that I get many oom with this patch:

Okay, evaluating this test-patch any further probably isn't worth it.
It's too aggressive, I think readahead is stealing pages reclaimed by
other allocations which in turn oom.

Back to the original problem: you detected increased latency for
launching new applications, so they get less share of the IO bandwidth
than without the patch.

I can see two reasons for this:

  a) the new heuristics don't work out and we read more unrelated
  pages than before

  b) we readahead more pages in total as the old code would stop at
  holes, as described above

We can verify a) by comparing major fault numbers between the two
kernels with your testload.  If they increase with my patch, we
anticipate the wrong slots and every fault has do the reading itself.

b) seems to be a trade-off.  After all, the IO resources you have less
for new applications in your test is the bandwidth that is used by
swapping applications.  My qsbench numbers are a sign for this as the
only IO going on is swap.

Of course, the theory is not to improve swap performance by increasing
the readahead window but to choose better readahead candidates.  So I
will run your tests and qsbench with a smaller page cluster and see if
this improves both loads.

Let me no if that doesn't make sense :)

Thanks a lot for all your efforts so far,

	Hannes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-18 13:01                                   ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-18 13:01 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> > On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > > (even worse this time):
> > > > 
> > > > Thanks for doing the tests.  Do you know if the time difference comes
> > > > from IO or CPU time?
> > > > 
> > > > Because one reason I could think of is that the original code walks
> > > > the readaround window in two directions, starting from the target each
> > > > time but immediately stops when it encounters a hole where the new
> > > > code just skips holes but doesn't abort readaround and thus might
> > > > indeed read more slots.
> > > > 
> > > > I have an old patch flying around that changed the physical ra code to
> > > > use a bitmap that is able to represent holes.  If the increased time
> > > > is waiting for IO, I would be interested if that patch has the same
> > > > negative impact.
> > > 
> > > You can send me the patch :)
> > 
> > Okay, attached is a rebase against latest -mmotm.
> > 
> > > But for this patch it is IO bound. The CPU iowait field actually is
> > > going up as the test goes on:
> > 
> > It's probably the larger ra window then which takes away the bandwidth
> > needed to load the new executables.  This sucks.  Would be nice to
> > have 'optional IO' for readahead that is dropped when normal-priority
> > IO requests are coming in...  Oh, we have READA for bios.  But it
> > doesn't seem to implement dropping requests on load (or I am blind).
> 
> Hi Hannes,
> 
> Sorry for the long delay! A bad news is that I get many oom with this patch:

Okay, evaluating this test-patch any further probably isn't worth it.
It's too aggressive, I think readahead is stealing pages reclaimed by
other allocations which in turn oom.

Back to the original problem: you detected increased latency for
launching new applications, so they get less share of the IO bandwidth
than without the patch.

I can see two reasons for this:

  a) the new heuristics don't work out and we read more unrelated
  pages than before

  b) we readahead more pages in total as the old code would stop at
  holes, as described above

We can verify a) by comparing major fault numbers between the two
kernels with your testload.  If they increase with my patch, we
anticipate the wrong slots and every fault has do the reading itself.

b) seems to be a trade-off.  After all, the IO resources you have less
for new applications in your test is the bandwidth that is used by
swapping applications.  My qsbench numbers are a sign for this as the
only IO going on is swap.

Of course, the theory is not to improve swap performance by increasing
the readahead window but to choose better readahead candidates.  So I
will run your tests and qsbench with a smaller page cluster and see if
this improves both loads.

Let me no if that doesn't make sense :)

Thanks a lot for all your efforts so far,

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-18  9:29       ` Wu Fengguang
@ 2009-06-18 13:09         ` Johannes Weiner
  -1 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-18 13:09 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Thu, Jun 18, 2009 at 05:29:47PM +0800, Wu Fengguang wrote:
> Johannes,
> 
> On Thu, Jun 18, 2009 at 06:41:49AM +0800, Johannes Weiner wrote:
> > On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 9 Jun 2009 21:01:28 +0200
> > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > [resend with lists cc'd, sorry]
> > > > 
> > > > +static int swap_readahead_ptes(struct mm_struct *mm,
> 
> I suspect the previous unfavorable results are due to comparing things
> with/without the drm vmalloc patch. So I spent one day redo the whole
> comparisons. The swap readahead patch shows neither big improvements
> nor big degradations this time.

Thanks again!  Nice.  So according to this, vswapra doesn't increase
other IO latency (much) but boosts ongoing swap loads (quite some) (as
qsbench showed).  Is that a result or what! :)

I will see how the tests described in the other mail work out.

> Base kernel is 2.6.30-rc8-mm1 with drm vmalloc patch.
> 
> a) base kernel
> b) base kernel + VM_EXEC protection
> c) base kernel + VM_EXEC protection + swap readahead
> 
>      (a)         (b)         (c)
>     0.02        0.02        0.01    N xeyes
>     0.78        0.92        0.77    N firefox
>     2.03        2.20        1.97    N nautilus
>     3.27        3.35        3.39    N nautilus --browser
>     5.10        5.28        4.99    N gthumb
>     6.74        7.06        6.64    N gedit
>     8.70        8.82        8.47    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
>    11.05       10.95       10.94    N
>    13.03       12.72       12.79    N xterm
>    15.46       15.09       15.10    N mlterm
>    18.05       17.31       17.51    N gnome-terminal
>    20.59       19.90       19.98    N urxvt
>    23.45       22.82       22.67    N
>    25.74       25.16       24.96    N gnome-system-monitor
>    28.87       27.53       27.89    N gnome-help
>    32.37       31.17       31.89    N gnome-dictionary
>    36.60       35.18       35.16    N
>    39.76       38.04       37.64    N /usr/games/sol
>    43.05       42.17       40.33    N /usr/games/gnometris
>    47.70       47.08       43.48    N /usr/games/gnect
>    51.64       50.46       47.24    N /usr/games/gtali
>    56.26       54.58       50.83    N /usr/games/iagno
>    60.36       58.01       55.15    N /usr/games/gnotravex
>    65.79       62.92       59.28    N /usr/games/mahjongg
>    71.59       67.36       65.95    N /usr/games/gnome-sudoku
>    78.57       72.32       72.60    N /usr/games/glines
>    84.25       80.03       77.42    N /usr/games/glchess
>    90.65       88.11       83.66    N /usr/games/gnomine
>    97.75       95.13       89.38    N /usr/games/gnotski
>   102.99      101.59       95.05    N /usr/games/gnibbles
>   110.68      112.05      109.40    N /usr/games/gnobots2
>   117.23      121.58      120.05    N /usr/games/blackjack
>   125.15      133.59      130.91    N /usr/games/same-gnome
>   134.05      151.99      148.91    N
>   142.57      162.67      165.00    N /usr/bin/gnome-window-properties
>   156.29      174.54      183.84    N /usr/bin/gnome-default-applications-properties
>   168.37      190.38      200.99    N /usr/bin/gnome-at-properties
>   184.80      209.41      230.82    N /usr/bin/gnome-typing-monitor
>   202.05      226.52      250.02    N /usr/bin/gnome-at-visual
>   217.60      243.76      272.91    N /usr/bin/gnome-sound-properties
>   239.78      266.47      308.74    N /usr/bin/gnome-at-mobility
>   255.23      285.42      338.51    N /usr/bin/gnome-keybinding-properties
>   276.85      314.84      374.64    N /usr/bin/gnome-about-me
>   308.51      355.95      419.78    N /usr/bin/gnome-display-properties
>   341.27      401.22      463.55    N /usr/bin/gnome-network-preferences
>   393.42      451.27      517.24    N /usr/bin/gnome-mouse-properties
>   438.48      510.54      574.64    N /usr/bin/gnome-appearance-properties
>   616.09      671.44      760.49    N /usr/bin/gnome-control-center
>   879.69      879.45      918.87    N /usr/bin/gnome-keyboard-properties
>  1159.47     1076.29     1071.65    N
>  1701.82     1240.47     1280.77    N : oocalc
>  1921.14     1446.95     1451.82    N : oodraw
>  2262.40     1572.95     1698.37    N : ooimpress
>  2703.88     1714.53     1841.89    N : oomath
>  3464.54     1864.99     1983.96    N : ooweb
>  4040.91     2079.96     2185.53    N : oowriter
>  4668.16     2330.24     2365.17    N
> 
>  Thanks,
>  Fengguang
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-18 13:09         ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-18 13:09 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

On Thu, Jun 18, 2009 at 05:29:47PM +0800, Wu Fengguang wrote:
> Johannes,
> 
> On Thu, Jun 18, 2009 at 06:41:49AM +0800, Johannes Weiner wrote:
> > On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 9 Jun 2009 21:01:28 +0200
> > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > [resend with lists cc'd, sorry]
> > > > 
> > > > +static int swap_readahead_ptes(struct mm_struct *mm,
> 
> I suspect the previous unfavorable results are due to comparing things
> with/without the drm vmalloc patch. So I spent one day redo the whole
> comparisons. The swap readahead patch shows neither big improvements
> nor big degradations this time.

Thanks again!  Nice.  So according to this, vswapra doesn't increase
other IO latency (much) but boosts ongoing swap loads (quite some) (as
qsbench showed).  Is that a result or what! :)

I will see how the tests described in the other mail work out.

> Base kernel is 2.6.30-rc8-mm1 with drm vmalloc patch.
> 
> a) base kernel
> b) base kernel + VM_EXEC protection
> c) base kernel + VM_EXEC protection + swap readahead
> 
>      (a)         (b)         (c)
>     0.02        0.02        0.01    N xeyes
>     0.78        0.92        0.77    N firefox
>     2.03        2.20        1.97    N nautilus
>     3.27        3.35        3.39    N nautilus --browser
>     5.10        5.28        4.99    N gthumb
>     6.74        7.06        6.64    N gedit
>     8.70        8.82        8.47    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
>    11.05       10.95       10.94    N
>    13.03       12.72       12.79    N xterm
>    15.46       15.09       15.10    N mlterm
>    18.05       17.31       17.51    N gnome-terminal
>    20.59       19.90       19.98    N urxvt
>    23.45       22.82       22.67    N
>    25.74       25.16       24.96    N gnome-system-monitor
>    28.87       27.53       27.89    N gnome-help
>    32.37       31.17       31.89    N gnome-dictionary
>    36.60       35.18       35.16    N
>    39.76       38.04       37.64    N /usr/games/sol
>    43.05       42.17       40.33    N /usr/games/gnometris
>    47.70       47.08       43.48    N /usr/games/gnect
>    51.64       50.46       47.24    N /usr/games/gtali
>    56.26       54.58       50.83    N /usr/games/iagno
>    60.36       58.01       55.15    N /usr/games/gnotravex
>    65.79       62.92       59.28    N /usr/games/mahjongg
>    71.59       67.36       65.95    N /usr/games/gnome-sudoku
>    78.57       72.32       72.60    N /usr/games/glines
>    84.25       80.03       77.42    N /usr/games/glchess
>    90.65       88.11       83.66    N /usr/games/gnomine
>    97.75       95.13       89.38    N /usr/games/gnotski
>   102.99      101.59       95.05    N /usr/games/gnibbles
>   110.68      112.05      109.40    N /usr/games/gnobots2
>   117.23      121.58      120.05    N /usr/games/blackjack
>   125.15      133.59      130.91    N /usr/games/same-gnome
>   134.05      151.99      148.91    N
>   142.57      162.67      165.00    N /usr/bin/gnome-window-properties
>   156.29      174.54      183.84    N /usr/bin/gnome-default-applications-properties
>   168.37      190.38      200.99    N /usr/bin/gnome-at-properties
>   184.80      209.41      230.82    N /usr/bin/gnome-typing-monitor
>   202.05      226.52      250.02    N /usr/bin/gnome-at-visual
>   217.60      243.76      272.91    N /usr/bin/gnome-sound-properties
>   239.78      266.47      308.74    N /usr/bin/gnome-at-mobility
>   255.23      285.42      338.51    N /usr/bin/gnome-keybinding-properties
>   276.85      314.84      374.64    N /usr/bin/gnome-about-me
>   308.51      355.95      419.78    N /usr/bin/gnome-display-properties
>   341.27      401.22      463.55    N /usr/bin/gnome-network-preferences
>   393.42      451.27      517.24    N /usr/bin/gnome-mouse-properties
>   438.48      510.54      574.64    N /usr/bin/gnome-appearance-properties
>   616.09      671.44      760.49    N /usr/bin/gnome-control-center
>   879.69      879.45      918.87    N /usr/bin/gnome-keyboard-properties
>  1159.47     1076.29     1071.65    N
>  1701.82     1240.47     1280.77    N : oocalc
>  1921.14     1446.95     1451.82    N : oodraw
>  2262.40     1572.95     1698.37    N : ooimpress
>  2703.88     1714.53     1841.89    N : oomath
>  3464.54     1864.99     1983.96    N : ooweb
>  4040.91     2079.96     2185.53    N : oowriter
>  4668.16     2330.24     2365.17    N
> 
>  Thanks,
>  Fengguang
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-18 13:09         ` Johannes Weiner
  (?)
@ 2009-06-19  3:17         ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-19  3:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Hugh Dickins,
	Andi Kleen, Minchan Kim, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6185 bytes --]

On Thu, Jun 18, 2009 at 09:09:34PM +0800, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:29:47PM +0800, Wu Fengguang wrote:
> > Johannes,
> > 
> > On Thu, Jun 18, 2009 at 06:41:49AM +0800, Johannes Weiner wrote:
> > > On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Tue, 9 Jun 2009 21:01:28 +0200
> > > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > [resend with lists cc'd, sorry]
> > > > > 
> > > > > +static int swap_readahead_ptes(struct mm_struct *mm,
> > 
> > I suspect the previous unfavorable results are due to comparing things
> > with/without the drm vmalloc patch. So I spent one day redo the whole
> > comparisons. The swap readahead patch shows neither big improvements
> > nor big degradations this time.
> 
> Thanks again!  Nice.  So according to this, vswapra doesn't increase
> other IO latency (much) but boosts ongoing swap loads (quite some) (as
> qsbench showed).  Is that a result or what! :)
> 
> I will see how the tests described in the other mail work out.

And here are the /proc/vmstat contents after each test run :)

The pswpin number goes down radically in case (c) which seems
illogical.

     pgpgin 8898235              pgpgin 4828771              pgpgin 1807731                           
     pgpgout 1806868             pgpgout 1463644             pgpgout 1382244                          
==>  pswpin 2222503              pswpin 1205137              pswpin 449877                            
     pswpout 451716              pswpout 365910              pswpout 345560                           
     pgalloc_dma 39883           pgalloc_dma 24343           pgalloc_dma 3547                         
     pgalloc_dma32 11918819      pgalloc_dma32 6810775       pgalloc_dma32 6387602                    
     pgalloc_normal 0            pgalloc_normal 0            pgalloc_normal 0
     pgalloc_movable 0           pgalloc_movable 0           pgalloc_movable 0
     pgfree 11961651             pgfree 6837658              pgfree 6396229                           
     pgactivate 5771012          pgactivate 2999101          pgactivate 2341219                       
     pgdeactivate 5909300        pgdeactivate 3140474        pgdeactivate 2481319                     
     pgfault 4536082             pgfault 3468555             pgfault 3589046                          
==>  pgmajfault 926383           pgmajfault 506265           pgmajfault 520010                        

Thanks,
Fengguang

> > Base kernel is 2.6.30-rc8-mm1 with drm vmalloc patch.
> > 
> > a) base kernel
> > b) base kernel + VM_EXEC protection
> > c) base kernel + VM_EXEC protection + swap readahead
> > 
> >      (a)         (b)         (c)
> >     0.02        0.02        0.01    N xeyes
> >     0.78        0.92        0.77    N firefox
> >     2.03        2.20        1.97    N nautilus
> >     3.27        3.35        3.39    N nautilus --browser
> >     5.10        5.28        4.99    N gthumb
> >     6.74        7.06        6.64    N gedit
> >     8.70        8.82        8.47    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
> >    11.05       10.95       10.94    N
> >    13.03       12.72       12.79    N xterm
> >    15.46       15.09       15.10    N mlterm
> >    18.05       17.31       17.51    N gnome-terminal
> >    20.59       19.90       19.98    N urxvt
> >    23.45       22.82       22.67    N
> >    25.74       25.16       24.96    N gnome-system-monitor
> >    28.87       27.53       27.89    N gnome-help
> >    32.37       31.17       31.89    N gnome-dictionary
> >    36.60       35.18       35.16    N
> >    39.76       38.04       37.64    N /usr/games/sol
> >    43.05       42.17       40.33    N /usr/games/gnometris
> >    47.70       47.08       43.48    N /usr/games/gnect
> >    51.64       50.46       47.24    N /usr/games/gtali
> >    56.26       54.58       50.83    N /usr/games/iagno
> >    60.36       58.01       55.15    N /usr/games/gnotravex
> >    65.79       62.92       59.28    N /usr/games/mahjongg
> >    71.59       67.36       65.95    N /usr/games/gnome-sudoku
> >    78.57       72.32       72.60    N /usr/games/glines
> >    84.25       80.03       77.42    N /usr/games/glchess
> >    90.65       88.11       83.66    N /usr/games/gnomine
> >    97.75       95.13       89.38    N /usr/games/gnotski
> >   102.99      101.59       95.05    N /usr/games/gnibbles
> >   110.68      112.05      109.40    N /usr/games/gnobots2
> >   117.23      121.58      120.05    N /usr/games/blackjack
> >   125.15      133.59      130.91    N /usr/games/same-gnome
> >   134.05      151.99      148.91    N
> >   142.57      162.67      165.00    N /usr/bin/gnome-window-properties
> >   156.29      174.54      183.84    N /usr/bin/gnome-default-applications-properties
> >   168.37      190.38      200.99    N /usr/bin/gnome-at-properties
> >   184.80      209.41      230.82    N /usr/bin/gnome-typing-monitor
> >   202.05      226.52      250.02    N /usr/bin/gnome-at-visual
> >   217.60      243.76      272.91    N /usr/bin/gnome-sound-properties
> >   239.78      266.47      308.74    N /usr/bin/gnome-at-mobility
> >   255.23      285.42      338.51    N /usr/bin/gnome-keybinding-properties
> >   276.85      314.84      374.64    N /usr/bin/gnome-about-me
> >   308.51      355.95      419.78    N /usr/bin/gnome-display-properties
> >   341.27      401.22      463.55    N /usr/bin/gnome-network-preferences
> >   393.42      451.27      517.24    N /usr/bin/gnome-mouse-properties
> >   438.48      510.54      574.64    N /usr/bin/gnome-appearance-properties
> >   616.09      671.44      760.49    N /usr/bin/gnome-control-center
> >   879.69      879.45      918.87    N /usr/bin/gnome-keyboard-properties
> >  1159.47     1076.29     1071.65    N
> >  1701.82     1240.47     1280.77    N : oocalc
> >  1921.14     1446.95     1451.82    N : oodraw
> >  2262.40     1572.95     1698.37    N : ooimpress
> >  2703.88     1714.53     1841.89    N : oomath
> >  3464.54     1864.99     1983.96    N : ooweb
> >  4040.91     2079.96     2185.53    N : oowriter
> >  4668.16     2330.24     2365.17    N
> > 
> >  Thanks,
> >  Fengguang
> > 

[-- Attachment #2: vmstat.0 --]
[-- Type: text/plain, Size: 1444 bytes --]

nr_free_pages 2774
nr_inactive_anon 49669
nr_active_anon 37887
nr_inactive_file 1943
nr_active_file 1432
nr_unevictable 4
nr_mlock 4
nr_anon_pages 33433
nr_mapped 3748
nr_file_pages 63113
nr_dirty 0
nr_writeback 14
nr_slab_reclaimable 3067
nr_slab_unreclaimable 11016
nr_page_table_pages 7733
nr_unstable 0
nr_bounce 0
nr_vmscan_write 452422
nr_writeback_temp 0
numa_hit 11905133
numa_miss 0
numa_foreign 0
numa_interleave 1719
numa_local 11905133
numa_other 0
pgpgin 8898235
pgpgout 1806868
pswpin 2222503
pswpout 451716
pgalloc_dma 39883
pgalloc_dma32 11918819
pgalloc_normal 0
pgalloc_movable 0
pgfree 11961651
pgactivate 5771012
pgdeactivate 5909300
pgfault 4536082
pgmajfault 926383
pgrefill_dma 3358
pgrefill_dma32 327639
pgrefill_normal 0
pgrefill_movable 0
pgsteal_dma 4163
pgsteal_dma32 9004008
pgsteal_normal 0
pgsteal_movable 0
pgscan_kswapd_dma 14283579
pgscan_kswapd_dma32 440003821
pgscan_kswapd_normal 0
pgscan_kswapd_movable 0
pgscan_direct_dma 2518976
pgscan_direct_dma32 85187744
pgscan_direct_normal 0
pgscan_direct_movable 0
pginodesteal 4578
slabs_scanned 567936
kswapd_steal 8653718
kswapd_inodesteal 11378
pageoutrun 154601
allocstall 7487
pgrotated 438820
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 0
unevictable_pgs_scanned 0
unevictable_pgs_rescued 0
unevictable_pgs_mlocked 4
unevictable_pgs_munlocked 0
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

[-- Attachment #3: vmstat.1 --]
[-- Type: text/plain, Size: 1437 bytes --]

nr_free_pages 2441
nr_inactive_anon 48016
nr_active_anon 38622
nr_inactive_file 3402
nr_active_file 1641
nr_unevictable 4
nr_mlock 4
nr_anon_pages 34731
nr_mapped 3599
nr_file_pages 62566
nr_dirty 0
nr_writeback 2
nr_slab_reclaimable 3091
nr_slab_unreclaimable 10664
nr_page_table_pages 7769
nr_unstable 0
nr_bounce 0
nr_vmscan_write 366662
nr_writeback_temp 0
numa_hit 6789271
numa_miss 0
numa_foreign 0
numa_interleave 1719
numa_local 6789271
numa_other 0
pgpgin 4828771
pgpgout 1463644
pswpin 1205137
pswpout 365910
pgalloc_dma 24343
pgalloc_dma32 6810775
pgalloc_normal 0
pgalloc_movable 0
pgfree 6837658
pgactivate 2999101
pgdeactivate 3140474
pgfault 3468555
pgmajfault 506265
pgrefill_dma 2336
pgrefill_dma32 729107
pgrefill_normal 0
pgrefill_movable 0
pgsteal_dma 1913
pgsteal_dma32 4357494
pgsteal_normal 0
pgsteal_movable 0
pgscan_kswapd_dma 6712845
pgscan_kswapd_dma32 286604714
pgscan_kswapd_normal 0
pgscan_kswapd_movable 0
pgscan_direct_dma 1341301
pgscan_direct_dma32 59422832
pgscan_direct_normal 0
pgscan_direct_movable 0
pginodesteal 4612
slabs_scanned 575616
kswapd_steal 4132786
kswapd_inodesteal 13238
pageoutrun 68758
allocstall 4576
pgrotated 359071
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 0
unevictable_pgs_scanned 0
unevictable_pgs_rescued 0
unevictable_pgs_mlocked 4
unevictable_pgs_munlocked 0
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

[-- Attachment #4: vmstat.2 --]
[-- Type: text/plain, Size: 1429 bytes --]

nr_free_pages 5027
nr_inactive_anon 45647
nr_active_anon 39133
nr_inactive_file 2213
nr_active_file 2204
nr_unevictable 4
nr_mlock 4
nr_anon_pages 34044
nr_mapped 4110
nr_file_pages 60736
nr_dirty 0
nr_writeback 2
nr_slab_reclaimable 3024
nr_slab_unreclaimable 10694
nr_page_table_pages 7737
nr_unstable 0
nr_bounce 0
nr_vmscan_write 346130
nr_writeback_temp 0
numa_hit 6348796
numa_miss 0
numa_foreign 0
numa_interleave 1719
numa_local 6348796
numa_other 0
pgpgin 1807731
pgpgout 1382244
pswpin 449877
pswpout 345560
pgalloc_dma 3547
pgalloc_dma32 6387602
pgalloc_normal 0
pgalloc_movable 0
pgfree 6396229
pgactivate 2341219
pgdeactivate 2481319
pgfault 3589046
pgmajfault 520010
pgrefill_dma 1760
pgrefill_dma32 704673
pgrefill_normal 0
pgrefill_movable 0
pgsteal_dma 0
pgsteal_dma32 3801681
pgsteal_normal 0
pgsteal_movable 0
pgscan_kswapd_dma 9155325
pgscan_kswapd_dma32 225882967
pgscan_kswapd_normal 0
pgscan_kswapd_movable 0
pgscan_direct_dma 89949
pgscan_direct_dma32 2499274
pgscan_direct_normal 0
pgscan_direct_movable 0
pginodesteal 3410
slabs_scanned 544000
kswapd_steal 3618518
kswapd_inodesteal 11438
pageoutrun 59774
allocstall 4014
pgrotated 326236
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 0
unevictable_pgs_scanned 0
unevictable_pgs_rescued 0
unevictable_pgs_mlocked 4
unevictable_pgs_munlocked 0
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-18 13:01                                   ` Johannes Weiner
@ 2009-06-19  3:30                                     ` Wu Fengguang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-19  3:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Thu, Jun 18, 2009 at 09:01:21PM +0800, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> > On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> > > On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > > > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > > > (even worse this time):
> > > > > 
> > > > > Thanks for doing the tests.  Do you know if the time difference comes
> > > > > from IO or CPU time?
> > > > > 
> > > > > Because one reason I could think of is that the original code walks
> > > > > the readaround window in two directions, starting from the target each
> > > > > time but immediately stops when it encounters a hole where the new
> > > > > code just skips holes but doesn't abort readaround and thus might
> > > > > indeed read more slots.
> > > > > 
> > > > > I have an old patch flying around that changed the physical ra code to
> > > > > use a bitmap that is able to represent holes.  If the increased time
> > > > > is waiting for IO, I would be interested if that patch has the same
> > > > > negative impact.
> > > > 
> > > > You can send me the patch :)
> > > 
> > > Okay, attached is a rebase against latest -mmotm.
> > > 
> > > > But for this patch it is IO bound. The CPU iowait field actually is
> > > > going up as the test goes on:
> > > 
> > > It's probably the larger ra window then which takes away the bandwidth
> > > needed to load the new executables.  This sucks.  Would be nice to
> > > have 'optional IO' for readahead that is dropped when normal-priority
> > > IO requests are coming in...  Oh, we have READA for bios.  But it
> > > doesn't seem to implement dropping requests on load (or I am blind).
> > 
> > Hi Hannes,
> > 
> > Sorry for the long delay! A bad news is that I get many oom with this patch:
> 
> Okay, evaluating this test-patch any further probably isn't worth it.
> It's too aggressive, I think readahead is stealing pages reclaimed by
> other allocations which in turn oom.

OK.

> Back to the original problem: you detected increased latency for
> launching new applications, so they get less share of the IO bandwidth

There are no "launch new app" phase. The test flow works like:

  for all apps {
        for all started apps {
                activate its GUI window
        }
        start one new app
  }
        
But yes, as time goes by, the test becomes more and more about
switching between existing windows under high memory pressure.

> than without the patch.
> 
> I can see two reasons for this:
> 
>   a) the new heuristics don't work out and we read more unrelated
>   pages than before
> 
>   b) we readahead more pages in total as the old code would stop at
>   holes, as described above
> 
> We can verify a) by comparing major fault numbers between the two

Plus pswpin numbers :) I found it significantly decreased when we do
pte swap readahead..  See another email.

> kernels with your testload.  If they increase with my patch, we
> anticipate the wrong slots and every fault has do the reading itself.
> 
> b) seems to be a trade-off.  After all, the IO resources you have less
> for new applications in your test is the bandwidth that is used by
> swapping applications.  My qsbench numbers are a sign for this as the
> only IO going on is swap.
> 
> Of course, the theory is not to improve swap performance by increasing
> the readahead window but to choose better readahead candidates.  So I
> will run your tests and qsbench with a smaller page cluster and see if
> this improves both loads.

The general principle is, any non sector number based readahead should
be really accurate in order to be a net gain. Because each readahead
page miss will lead to one disk seek, which is much more costly than
wasting a memory page.

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-19  3:30                                     ` Wu Fengguang
  0 siblings, 0 replies; 55+ messages in thread
From: Wu Fengguang @ 2009-06-19  3:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki, Andrew Morton,
	Rik van Riel, Hugh Dickins, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Thu, Jun 18, 2009 at 09:01:21PM +0800, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> > On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> > > On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > > > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > > > (even worse this time):
> > > > > 
> > > > > Thanks for doing the tests.  Do you know if the time difference comes
> > > > > from IO or CPU time?
> > > > > 
> > > > > Because one reason I could think of is that the original code walks
> > > > > the readaround window in two directions, starting from the target each
> > > > > time but immediately stops when it encounters a hole where the new
> > > > > code just skips holes but doesn't abort readaround and thus might
> > > > > indeed read more slots.
> > > > > 
> > > > > I have an old patch flying around that changed the physical ra code to
> > > > > use a bitmap that is able to represent holes.  If the increased time
> > > > > is waiting for IO, I would be interested if that patch has the same
> > > > > negative impact.
> > > > 
> > > > You can send me the patch :)
> > > 
> > > Okay, attached is a rebase against latest -mmotm.
> > > 
> > > > But for this patch it is IO bound. The CPU iowait field actually is
> > > > going up as the test goes on:
> > > 
> > > It's probably the larger ra window then which takes away the bandwidth
> > > needed to load the new executables.  This sucks.  Would be nice to
> > > have 'optional IO' for readahead that is dropped when normal-priority
> > > IO requests are coming in...  Oh, we have READA for bios.  But it
> > > doesn't seem to implement dropping requests on load (or I am blind).
> > 
> > Hi Hannes,
> > 
> > Sorry for the long delay! A bad news is that I get many oom with this patch:
> 
> Okay, evaluating this test-patch any further probably isn't worth it.
> It's too aggressive, I think readahead is stealing pages reclaimed by
> other allocations which in turn oom.

OK.

> Back to the original problem: you detected increased latency for
> launching new applications, so they get less share of the IO bandwidth

There are no "launch new app" phase. The test flow works like:

  for all apps {
        for all started apps {
                activate its GUI window
        }
        start one new app
  }
        
But yes, as time goes by, the test becomes more and more about
switching between existing windows under high memory pressure.

> than without the patch.
> 
> I can see two reasons for this:
> 
>   a) the new heuristics don't work out and we read more unrelated
>   pages than before
> 
>   b) we readahead more pages in total as the old code would stop at
>   holes, as described above
> 
> We can verify a) by comparing major fault numbers between the two

Plus pswpin numbers :) I found it significantly decreased when we do
pte swap readahead..  See another email.

> kernels with your testload.  If they increase with my patch, we
> anticipate the wrong slots and every fault has do the reading itself.
> 
> b) seems to be a trade-off.  After all, the IO resources you have less
> for new applications in your test is the bandwidth that is used by
> swapping applications.  My qsbench numbers are a sign for this as the
> only IO going on is swap.
> 
> Of course, the theory is not to improve swap performance by increasing
> the readahead window but to choose better readahead candidates.  So I
> will run your tests and qsbench with a smaller page cluster and see if
> this improves both loads.

The general principle is, any non sector number based readahead should
be really accurate in order to be a net gain. Because each readahead
page miss will lead to one disk seek, which is much more costly than
wasting a memory page.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-18 13:01                                   ` Johannes Weiner
@ 2009-06-21 18:07                                     ` Hugh Dickins
  -1 siblings, 0 replies; 55+ messages in thread
From: Hugh Dickins @ 2009-06-21 18:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Wu Fengguang, Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Andrew Morton, Rik van Riel, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

Hi Hannes,

On Thu, 18 Jun 2009, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> 
> Okay, evaluating this test-patch any further probably isn't worth it.
> It's too aggressive, I think readahead is stealing pages reclaimed by
> other allocations which in turn oom.
> 
> Back to the original problem: you detected increased latency for
> launching new applications, so they get less share of the IO bandwidth
> than without the patch.
> 
> I can see two reasons for this:
> 
>   a) the new heuristics don't work out and we read more unrelated
>   pages than before
> 
>   b) we readahead more pages in total as the old code would stop at
>   holes, as described above
> 
> We can verify a) by comparing major fault numbers between the two
> kernels with your testload.  If they increase with my patch, we
> anticipate the wrong slots and every fault has do the reading itself.
> 
> b) seems to be a trade-off.  After all, the IO resources you have less
> for new applications in your test is the bandwidth that is used by
> swapping applications.  My qsbench numbers are a sign for this as the
> only IO going on is swap.
> 
> Of course, the theory is not to improve swap performance by increasing
> the readahead window but to choose better readahead candidates.  So I
> will run your tests and qsbench with a smaller page cluster and see if
> this improves both loads.

Hmm, sounds rather pessimistic; but I've not decided about it either.

May I please hand over to you this collection of adjustments to your
v3 virtual swap readahead patch, for you to merge in or split up or
mess around with, generally take ownership of, however you wish?
So you can keep adjusting shmem.c to match memory.c if necessary.

I still think your method looks a very good idea, though results have
not yet convinced me that it necessarily works out better in practice;
and I probably won't be looking at it again for a while.

The base for this patch was 2.6.30 + your v3.

* shmem_getpage() call shmem_swap_cluster() to collect vector of swap
  entries for shmem_swapin(), while we still have them kmap'ped.

* Variable-sized arrays on stack are not popular: I forget whether the
  kernel build still supports any gccs which can't manage them, but they
  do obscure stack usage, and shmem_getpage is already a suspect for that
  (because of the pseudo-vma usage which I hope to remove): should be fine
  while you're experimenting, but in the end let's define PAGE_CLUSTER_MAX.

* Fix "> pmax" in swapin_readahead() to ">= pmax": of course this is
  only a heuristic, so it wasn't accusably wrong; but we are trying for
  a particular range, so it's right to reject < pmin and >= pmax there.

* Kamezawa-san's two one-liners to swap_readahead_ptes(), of course.

* Delete valid_swaphandles() once it's unused (though I can imagine a
  useful test patch in which we could switch between old and new methods).

* swapin_readahead() was always poorly named: while you're changing its
  behaviour, let's take the opportunity to rename it swapin_readaround();
  yes, that triviality would be better as a separate patch.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 include/linux/mm.h   |    6 ++++
 include/linux/swap.h |    5 +--
 kernel/sysctl.c      |    2 -
 mm/memory.c          |   16 ++++++------
 mm/shmem.c           |   47 +++++++++++++++++++++++++++++++++----
 mm/swap_state.c      |   46 +++---------------------------------
 mm/swapfile.c        |   52 -----------------------------------------
 71 files changed, 64 insertions(+), 110 deletions(-)

--- 2.6.30-hv3/include/linux/mm.h	2009-06-10 04:05:27.000000000 +0100
+++ 2.6.30-hv4/include/linux/mm.h	2009-06-21 14:59:27.000000000 +0100
@@ -26,6 +26,12 @@ extern unsigned long max_mapnr;
 
 extern unsigned long num_physpages;
 extern void * high_memory;
+
+/*
+ * page_cluster limits swapin_readaround: tuned by /proc/sys/vm/page-cluster
+ * 1 << page_cluster is the maximum number of pages which may be read
+ */
+#define PAGE_CLUSTER_MAX	5
 extern int page_cluster;
 
 #ifdef CONFIG_SYSCTL
--- 2.6.30-hv3/include/linux/swap.h	2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/include/linux/swap.h	2009-06-21 14:59:27.000000000 +0100
@@ -291,7 +291,7 @@ extern void free_pages_and_swap_cache(st
 extern struct page *lookup_swap_cache(swp_entry_t);
 extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
 			struct vm_area_struct *vma, unsigned long addr);
-extern struct page *swapin_readahead(swp_entry_t, gfp_t,
+extern struct page *swapin_readaround(swp_entry_t, gfp_t,
 			struct vm_area_struct *vma, unsigned long addr,
 			swp_entry_t *entries, int nr_entries,
 			unsigned long cluster);
@@ -303,7 +303,6 @@ extern void si_swapinfo(struct sysinfo *
 extern swp_entry_t get_swap_page(void);
 extern swp_entry_t get_swap_page_of_type(int);
 extern int swap_duplicate(swp_entry_t);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
 extern void swap_free(swp_entry_t);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
@@ -378,7 +377,7 @@ static inline void swap_free(swp_entry_t
 {
 }
 
-static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
+static inline struct page *swapin_readaround(swp_entry_t swp, gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr)
 {
 	return NULL;
--- 2.6.30-hv3/kernel/sysctl.c	2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/kernel/sysctl.c	2009-06-21 14:59:27.000000000 +0100
@@ -112,7 +112,7 @@ static int min_percpu_pagelist_fract = 8
 
 static int ngroups_max = NGROUPS_MAX;
 
-static int page_cluster_max = 5;
+static int page_cluster_max = PAGE_CLUSTER_MAX;
 
 #ifdef CONFIG_MODULES
 extern char modprobe_path[];
--- 2.6.30-hv3/mm/memory.c	2009-06-21 14:55:44.000000000 +0100
+++ 2.6.30-hv4/mm/memory.c	2009-06-21 14:59:27.000000000 +0100
@@ -2440,9 +2440,9 @@ int vmtruncate_range(struct inode *inode
 }
 
 /*
- * The readahead window is the virtual area around the faulting page,
+ * The readaround window is the virtual area around the faulting page,
  * where the physical proximity of the swap slots is taken into
- * account as well in swapin_readahead().
+ * account as well in swapin_readaround().
  *
  * While the swap allocation algorithm tries to keep LRU-related pages
  * together on the swap backing, it is not reliable on heavy thrashing
@@ -2455,7 +2455,7 @@ int vmtruncate_range(struct inode *inode
  * By taking both aspects into account, we get a good approximation of
  * which pages are sensible to read together with the faulting one.
  */
-static int swap_readahead_ptes(struct mm_struct *mm,
+static int swap_readaround_ptes(struct mm_struct *mm,
 			unsigned long addr, pmd_t *pmd,
 			swp_entry_t *entries,
 			unsigned long cluster)
@@ -2467,7 +2467,7 @@ static int swap_readahead_ptes(struct mm
 
 	window = cluster << PAGE_SHIFT;
 	min = addr & ~(window - 1);
-	max = min + cluster;
+	max = min + window;
 	/*
 	 * To keep the locking/highpte mapping simple, stay
 	 * within the PTE range of one PMD entry.
@@ -2478,7 +2478,7 @@ static int swap_readahead_ptes(struct mm
 	limit = pmd_addr_end(addr, max);
 	if (limit < max)
 		max = limit;
-	limit = max - min;
+	limit = (max - min) >> PAGE_SHIFT;
 	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
 	for (i = nr = 0; i < limit; i++)
 		if (is_swap_pte(ptep[i]))
@@ -2515,11 +2515,11 @@ static int do_swap_page(struct mm_struct
 	page = lookup_swap_cache(entry);
 	if (!page) {
 		int nr, cluster = 1 << page_cluster;
-		swp_entry_t entries[cluster];
+		swp_entry_t entries[1 << PAGE_CLUSTER_MAX];
 
 		grab_swap_token(); /* Contend for token _before_ read-in */
-		nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
-		page = swapin_readahead(entry,
+		nr = swap_readaround_ptes(mm, address, pmd, entries, cluster);
+		page = swapin_readaround(entry,
 					GFP_HIGHUSER_MOVABLE, vma, address,
 					entries, nr, cluster);
 		if (!page) {
--- 2.6.30-hv3/mm/shmem.c	2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/mm/shmem.c	2009-06-21 14:59:27.000000000 +0100
@@ -1134,7 +1134,8 @@ static struct mempolicy *shmem_get_sbmpo
 #endif /* CONFIG_TMPFS */
 
 static struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
-			struct shmem_inode_info *info, unsigned long idx)
+		struct shmem_inode_info *info, unsigned long idx,
+		swp_entry_t *entries, int nr_entries, unsigned long cluster)
 {
 	struct mempolicy mpol, *spol;
 	struct vm_area_struct pvma;
@@ -1148,7 +1149,8 @@ static struct page *shmem_swapin(swp_ent
 	pvma.vm_pgoff = idx;
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = spol;
-	page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
+	page = swapin_readaround(entry, gfp, &pvma, 0,
+				entries, nr_entries, cluster);
 	return page;
 }
 
@@ -1176,9 +1178,11 @@ static inline void shmem_show_mpol(struc
 #endif /* CONFIG_TMPFS */
 
 static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
-			struct shmem_inode_info *info, unsigned long idx)
+		struct shmem_inode_info *info, unsigned long idx,
+		swp_entry_t *entries, int nr_entries, unsigned long cluster)
 {
-	return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
+	return swapin_readaround(entry, gfp, NULL, 0,
+				entries, nr_entries, cluster);
 }
 
 static inline struct page *shmem_alloc_page(gfp_t gfp,
@@ -1195,6 +1199,33 @@ static inline struct mempolicy *shmem_ge
 }
 #endif
 
+static int shmem_swap_cluster(swp_entry_t *entry, unsigned long idx,
+				swp_entry_t *entries, unsigned long cluster)
+{
+	unsigned long min, max, limit;
+	int i, nr;
+
+	limit = SHMEM_NR_DIRECT;
+	if (idx >= SHMEM_NR_DIRECT) {
+		idx -= SHMEM_NR_DIRECT;
+		idx %= ENTRIES_PER_PAGE;
+		limit = ENTRIES_PER_PAGE;
+	}
+
+	min = idx & ~(cluster - 1);
+	max = min + cluster;
+	if (max > limit)
+		max = limit;
+	entry -= (idx - min);
+	limit = max - min;
+
+	for (i = nr = 0; i < limit; i++) {
+		if (entry[i].val)
+			entries[nr++] = entry[i];
+	}
+	return nr;
+}
+
 /*
  * shmem_getpage - either get the page from swap or allocate a new one
  *
@@ -1261,6 +1292,11 @@ repeat:
 		/* Look it up and read it in.. */
 		swappage = lookup_swap_cache(swap);
 		if (!swappage) {
+			int nr_entries, cluster = 1 << page_cluster;
+			swp_entry_t entries[1 << PAGE_CLUSTER_MAX];
+
+			nr_entries = shmem_swap_cluster(entry, idx,
+							entries, cluster);
 			shmem_swp_unmap(entry);
 			/* here we actually do the io */
 			if (type && !(*type & VM_FAULT_MAJOR)) {
@@ -1268,7 +1304,8 @@ repeat:
 				*type |= VM_FAULT_MAJOR;
 			}
 			spin_unlock(&info->lock);
-			swappage = shmem_swapin(swap, gfp, info, idx);
+			swappage = shmem_swapin(swap, gfp, info, idx,
+						entries, nr_entries, cluster);
 			if (!swappage) {
 				spin_lock(&info->lock);
 				entry = shmem_swp_alloc(info, idx, sgp);
--- 2.6.30-hv3/mm/swap_state.c	2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/mm/swap_state.c	2009-06-21 14:59:27.000000000 +0100
@@ -325,58 +325,24 @@ struct page *read_swap_cache_async(swp_e
 	return found_page;
 }
 
-/*
- * Primitive swap readahead code. We simply read an aligned block of
- * (1 << page_cluster) entries in the swap area. This method is chosen
- * because it doesn't cost us any seek time.  We also make sure to queue
- * the 'original' request together with the readahead ones...
- */
-static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
-				struct vm_area_struct *vma, unsigned long addr)
-{
-	int nr_pages;
-	struct page *page;
-	unsigned long offset;
-	unsigned long end_offset;
-
-	/*
-	 * Get starting offset for readaround, and number of pages to read.
-	 * Adjust starting address by readbehind (for NUMA interleave case)?
-	 * No, it's very unlikely that swap layout would follow vma layout,
-	 * more likely that neighbouring swap pages came from the same node:
-	 * so use the same "addr" to choose the same node for each swap read.
-	 */
-	nr_pages = valid_swaphandles(entry, &offset);
-	for (end_offset = offset + nr_pages; offset < end_offset; offset++) {
-		/* Ok, do the async read-ahead now */
-		page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
-						gfp_mask, vma, addr);
-		if (!page)
-			break;
-		page_cache_release(page);
-	}
-	lru_add_drain();	/* Push any new pages onto the LRU now */
-	return read_swap_cache_async(entry, gfp_mask, vma, addr);
-}
-
 /**
- * swapin_readahead - swap in pages in hope we need them soon
+ * swapin_readaround - swap in pages in hope we need them soon
  * @entry: swap entry of this memory
  * @gfp_mask: memory allocation flags
  * @vma: user vma this address belongs to
  * @addr: target address for mempolicy
  * @entries: swap slots to consider reading
  * @nr_entries: number of @entries
- * @cluster: readahead window size in swap slots
+ * @cluster: readaround window size in swap slots
  *
  * Returns the struct page for entry and addr, after queueing swapin.
  *
  * This has been extended to use the NUMA policies from the mm
- * triggering the readahead.
+ * triggering the readaround.
  *
  * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
  */
-struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
+struct page *swapin_readaround(swp_entry_t entry, gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr,
 			swp_entry_t *entries, int nr_entries,
 			unsigned long cluster)
@@ -384,8 +350,6 @@ struct page *swapin_readahead(swp_entry_
 	unsigned long pmin, pmax;
 	int i;
 
-	if (!entries)	/* XXX: shmem case */
-		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
 	pmin = swp_offset(entry) & ~(cluster - 1);
 	pmax = pmin + cluster;
 	for (i = 0; i < nr_entries; i++) {
@@ -394,7 +358,7 @@ struct page *swapin_readahead(swp_entry_
 
 		if (swp_type(swp) != swp_type(entry))
 			continue;
-		if (swp_offset(swp) > pmax)
+		if (swp_offset(swp) >= pmax)
 			continue;
 		if (swp_offset(swp) < pmin)
 			continue;
--- 2.6.30-hv3/mm/swapfile.c	2009-03-23 23:12:14.000000000 +0000
+++ 2.6.30-hv4/mm/swapfile.c	2009-06-21 14:59:27.000000000 +0100
@@ -1984,55 +1984,3 @@ get_swap_info_struct(unsigned type)
 {
 	return &swap_info[type];
 }
-
-/*
- * swap_lock prevents swap_map being freed. Don't grab an extra
- * reference on the swaphandle, it doesn't matter if it becomes unused.
- */
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
-{
-	struct swap_info_struct *si;
-	int our_page_cluster = page_cluster;
-	pgoff_t target, toff;
-	pgoff_t base, end;
-	int nr_pages = 0;
-
-	if (!our_page_cluster)	/* no readahead */
-		return 0;
-
-	si = &swap_info[swp_type(entry)];
-	target = swp_offset(entry);
-	base = (target >> our_page_cluster) << our_page_cluster;
-	end = base + (1 << our_page_cluster);
-	if (!base)		/* first page is swap header */
-		base++;
-
-	spin_lock(&swap_lock);
-	if (end > si->max)	/* don't go beyond end of map */
-		end = si->max;
-
-	/* Count contiguous allocated slots above our target */
-	for (toff = target; ++toff < end; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (si->swap_map[toff] == SWAP_MAP_BAD)
-			break;
-	}
-	/* Count contiguous allocated slots below our target */
-	for (toff = target; --toff >= base; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (si->swap_map[toff] == SWAP_MAP_BAD)
-			break;
-	}
-	spin_unlock(&swap_lock);
-
-	/*
-	 * Indicate starting offset, and return number of pages to get:
-	 * if only 1, say 0, since there's then no readahead to be done.
-	 */
-	*offset = ++toff;
-	return nr_pages? ++nr_pages: 0;
-}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-21 18:07                                     ` Hugh Dickins
  0 siblings, 0 replies; 55+ messages in thread
From: Hugh Dickins @ 2009-06-21 18:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Wu Fengguang, Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Andrew Morton, Rik van Riel, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

Hi Hannes,

On Thu, 18 Jun 2009, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> 
> Okay, evaluating this test-patch any further probably isn't worth it.
> It's too aggressive, I think readahead is stealing pages reclaimed by
> other allocations which in turn oom.
> 
> Back to the original problem: you detected increased latency for
> launching new applications, so they get less share of the IO bandwidth
> than without the patch.
> 
> I can see two reasons for this:
> 
>   a) the new heuristics don't work out and we read more unrelated
>   pages than before
> 
>   b) we readahead more pages in total as the old code would stop at
>   holes, as described above
> 
> We can verify a) by comparing major fault numbers between the two
> kernels with your testload.  If they increase with my patch, we
> anticipate the wrong slots and every fault has do the reading itself.
> 
> b) seems to be a trade-off.  After all, the IO resources you have less
> for new applications in your test is the bandwidth that is used by
> swapping applications.  My qsbench numbers are a sign for this as the
> only IO going on is swap.
> 
> Of course, the theory is not to improve swap performance by increasing
> the readahead window but to choose better readahead candidates.  So I
> will run your tests and qsbench with a smaller page cluster and see if
> this improves both loads.

Hmm, sounds rather pessimistic; but I've not decided about it either.

May I please hand over to you this collection of adjustments to your
v3 virtual swap readahead patch, for you to merge in or split up or
mess around with, generally take ownership of, however you wish?
So you can keep adjusting shmem.c to match memory.c if necessary.

I still think your method looks a very good idea, though results have
not yet convinced me that it necessarily works out better in practice;
and I probably won't be looking at it again for a while.

The base for this patch was 2.6.30 + your v3.

* shmem_getpage() call shmem_swap_cluster() to collect vector of swap
  entries for shmem_swapin(), while we still have them kmap'ped.

* Variable-sized arrays on stack are not popular: I forget whether the
  kernel build still supports any gccs which can't manage them, but they
  do obscure stack usage, and shmem_getpage is already a suspect for that
  (because of the pseudo-vma usage which I hope to remove): should be fine
  while you're experimenting, but in the end let's define PAGE_CLUSTER_MAX.

* Fix "> pmax" in swapin_readahead() to ">= pmax": of course this is
  only a heuristic, so it wasn't accusably wrong; but we are trying for
  a particular range, so it's right to reject < pmin and >= pmax there.

* Kamezawa-san's two one-liners to swap_readahead_ptes(), of course.

* Delete valid_swaphandles() once it's unused (though I can imagine a
  useful test patch in which we could switch between old and new methods).

* swapin_readahead() was always poorly named: while you're changing its
  behaviour, let's take the opportunity to rename it swapin_readaround();
  yes, that triviality would be better as a separate patch.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 include/linux/mm.h   |    6 ++++
 include/linux/swap.h |    5 +--
 kernel/sysctl.c      |    2 -
 mm/memory.c          |   16 ++++++------
 mm/shmem.c           |   47 +++++++++++++++++++++++++++++++++----
 mm/swap_state.c      |   46 +++---------------------------------
 mm/swapfile.c        |   52 -----------------------------------------
 71 files changed, 64 insertions(+), 110 deletions(-)

--- 2.6.30-hv3/include/linux/mm.h	2009-06-10 04:05:27.000000000 +0100
+++ 2.6.30-hv4/include/linux/mm.h	2009-06-21 14:59:27.000000000 +0100
@@ -26,6 +26,12 @@ extern unsigned long max_mapnr;
 
 extern unsigned long num_physpages;
 extern void * high_memory;
+
+/*
+ * page_cluster limits swapin_readaround: tuned by /proc/sys/vm/page-cluster
+ * 1 << page_cluster is the maximum number of pages which may be read
+ */
+#define PAGE_CLUSTER_MAX	5
 extern int page_cluster;
 
 #ifdef CONFIG_SYSCTL
--- 2.6.30-hv3/include/linux/swap.h	2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/include/linux/swap.h	2009-06-21 14:59:27.000000000 +0100
@@ -291,7 +291,7 @@ extern void free_pages_and_swap_cache(st
 extern struct page *lookup_swap_cache(swp_entry_t);
 extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
 			struct vm_area_struct *vma, unsigned long addr);
-extern struct page *swapin_readahead(swp_entry_t, gfp_t,
+extern struct page *swapin_readaround(swp_entry_t, gfp_t,
 			struct vm_area_struct *vma, unsigned long addr,
 			swp_entry_t *entries, int nr_entries,
 			unsigned long cluster);
@@ -303,7 +303,6 @@ extern void si_swapinfo(struct sysinfo *
 extern swp_entry_t get_swap_page(void);
 extern swp_entry_t get_swap_page_of_type(int);
 extern int swap_duplicate(swp_entry_t);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
 extern void swap_free(swp_entry_t);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
@@ -378,7 +377,7 @@ static inline void swap_free(swp_entry_t
 {
 }
 
-static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
+static inline struct page *swapin_readaround(swp_entry_t swp, gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr)
 {
 	return NULL;
--- 2.6.30-hv3/kernel/sysctl.c	2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/kernel/sysctl.c	2009-06-21 14:59:27.000000000 +0100
@@ -112,7 +112,7 @@ static int min_percpu_pagelist_fract = 8
 
 static int ngroups_max = NGROUPS_MAX;
 
-static int page_cluster_max = 5;
+static int page_cluster_max = PAGE_CLUSTER_MAX;
 
 #ifdef CONFIG_MODULES
 extern char modprobe_path[];
--- 2.6.30-hv3/mm/memory.c	2009-06-21 14:55:44.000000000 +0100
+++ 2.6.30-hv4/mm/memory.c	2009-06-21 14:59:27.000000000 +0100
@@ -2440,9 +2440,9 @@ int vmtruncate_range(struct inode *inode
 }
 
 /*
- * The readahead window is the virtual area around the faulting page,
+ * The readaround window is the virtual area around the faulting page,
  * where the physical proximity of the swap slots is taken into
- * account as well in swapin_readahead().
+ * account as well in swapin_readaround().
  *
  * While the swap allocation algorithm tries to keep LRU-related pages
  * together on the swap backing, it is not reliable on heavy thrashing
@@ -2455,7 +2455,7 @@ int vmtruncate_range(struct inode *inode
  * By taking both aspects into account, we get a good approximation of
  * which pages are sensible to read together with the faulting one.
  */
-static int swap_readahead_ptes(struct mm_struct *mm,
+static int swap_readaround_ptes(struct mm_struct *mm,
 			unsigned long addr, pmd_t *pmd,
 			swp_entry_t *entries,
 			unsigned long cluster)
@@ -2467,7 +2467,7 @@ static int swap_readahead_ptes(struct mm
 
 	window = cluster << PAGE_SHIFT;
 	min = addr & ~(window - 1);
-	max = min + cluster;
+	max = min + window;
 	/*
 	 * To keep the locking/highpte mapping simple, stay
 	 * within the PTE range of one PMD entry.
@@ -2478,7 +2478,7 @@ static int swap_readahead_ptes(struct mm
 	limit = pmd_addr_end(addr, max);
 	if (limit < max)
 		max = limit;
-	limit = max - min;
+	limit = (max - min) >> PAGE_SHIFT;
 	ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
 	for (i = nr = 0; i < limit; i++)
 		if (is_swap_pte(ptep[i]))
@@ -2515,11 +2515,11 @@ static int do_swap_page(struct mm_struct
 	page = lookup_swap_cache(entry);
 	if (!page) {
 		int nr, cluster = 1 << page_cluster;
-		swp_entry_t entries[cluster];
+		swp_entry_t entries[1 << PAGE_CLUSTER_MAX];
 
 		grab_swap_token(); /* Contend for token _before_ read-in */
-		nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
-		page = swapin_readahead(entry,
+		nr = swap_readaround_ptes(mm, address, pmd, entries, cluster);
+		page = swapin_readaround(entry,
 					GFP_HIGHUSER_MOVABLE, vma, address,
 					entries, nr, cluster);
 		if (!page) {
--- 2.6.30-hv3/mm/shmem.c	2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/mm/shmem.c	2009-06-21 14:59:27.000000000 +0100
@@ -1134,7 +1134,8 @@ static struct mempolicy *shmem_get_sbmpo
 #endif /* CONFIG_TMPFS */
 
 static struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
-			struct shmem_inode_info *info, unsigned long idx)
+		struct shmem_inode_info *info, unsigned long idx,
+		swp_entry_t *entries, int nr_entries, unsigned long cluster)
 {
 	struct mempolicy mpol, *spol;
 	struct vm_area_struct pvma;
@@ -1148,7 +1149,8 @@ static struct page *shmem_swapin(swp_ent
 	pvma.vm_pgoff = idx;
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = spol;
-	page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
+	page = swapin_readaround(entry, gfp, &pvma, 0,
+				entries, nr_entries, cluster);
 	return page;
 }
 
@@ -1176,9 +1178,11 @@ static inline void shmem_show_mpol(struc
 #endif /* CONFIG_TMPFS */
 
 static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
-			struct shmem_inode_info *info, unsigned long idx)
+		struct shmem_inode_info *info, unsigned long idx,
+		swp_entry_t *entries, int nr_entries, unsigned long cluster)
 {
-	return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
+	return swapin_readaround(entry, gfp, NULL, 0,
+				entries, nr_entries, cluster);
 }
 
 static inline struct page *shmem_alloc_page(gfp_t gfp,
@@ -1195,6 +1199,33 @@ static inline struct mempolicy *shmem_ge
 }
 #endif
 
+static int shmem_swap_cluster(swp_entry_t *entry, unsigned long idx,
+				swp_entry_t *entries, unsigned long cluster)
+{
+	unsigned long min, max, limit;
+	int i, nr;
+
+	limit = SHMEM_NR_DIRECT;
+	if (idx >= SHMEM_NR_DIRECT) {
+		idx -= SHMEM_NR_DIRECT;
+		idx %= ENTRIES_PER_PAGE;
+		limit = ENTRIES_PER_PAGE;
+	}
+
+	min = idx & ~(cluster - 1);
+	max = min + cluster;
+	if (max > limit)
+		max = limit;
+	entry -= (idx - min);
+	limit = max - min;
+
+	for (i = nr = 0; i < limit; i++) {
+		if (entry[i].val)
+			entries[nr++] = entry[i];
+	}
+	return nr;
+}
+
 /*
  * shmem_getpage - either get the page from swap or allocate a new one
  *
@@ -1261,6 +1292,11 @@ repeat:
 		/* Look it up and read it in.. */
 		swappage = lookup_swap_cache(swap);
 		if (!swappage) {
+			int nr_entries, cluster = 1 << page_cluster;
+			swp_entry_t entries[1 << PAGE_CLUSTER_MAX];
+
+			nr_entries = shmem_swap_cluster(entry, idx,
+							entries, cluster);
 			shmem_swp_unmap(entry);
 			/* here we actually do the io */
 			if (type && !(*type & VM_FAULT_MAJOR)) {
@@ -1268,7 +1304,8 @@ repeat:
 				*type |= VM_FAULT_MAJOR;
 			}
 			spin_unlock(&info->lock);
-			swappage = shmem_swapin(swap, gfp, info, idx);
+			swappage = shmem_swapin(swap, gfp, info, idx,
+						entries, nr_entries, cluster);
 			if (!swappage) {
 				spin_lock(&info->lock);
 				entry = shmem_swp_alloc(info, idx, sgp);
--- 2.6.30-hv3/mm/swap_state.c	2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/mm/swap_state.c	2009-06-21 14:59:27.000000000 +0100
@@ -325,58 +325,24 @@ struct page *read_swap_cache_async(swp_e
 	return found_page;
 }
 
-/*
- * Primitive swap readahead code. We simply read an aligned block of
- * (1 << page_cluster) entries in the swap area. This method is chosen
- * because it doesn't cost us any seek time.  We also make sure to queue
- * the 'original' request together with the readahead ones...
- */
-static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
-				struct vm_area_struct *vma, unsigned long addr)
-{
-	int nr_pages;
-	struct page *page;
-	unsigned long offset;
-	unsigned long end_offset;
-
-	/*
-	 * Get starting offset for readaround, and number of pages to read.
-	 * Adjust starting address by readbehind (for NUMA interleave case)?
-	 * No, it's very unlikely that swap layout would follow vma layout,
-	 * more likely that neighbouring swap pages came from the same node:
-	 * so use the same "addr" to choose the same node for each swap read.
-	 */
-	nr_pages = valid_swaphandles(entry, &offset);
-	for (end_offset = offset + nr_pages; offset < end_offset; offset++) {
-		/* Ok, do the async read-ahead now */
-		page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
-						gfp_mask, vma, addr);
-		if (!page)
-			break;
-		page_cache_release(page);
-	}
-	lru_add_drain();	/* Push any new pages onto the LRU now */
-	return read_swap_cache_async(entry, gfp_mask, vma, addr);
-}
-
 /**
- * swapin_readahead - swap in pages in hope we need them soon
+ * swapin_readaround - swap in pages in hope we need them soon
  * @entry: swap entry of this memory
  * @gfp_mask: memory allocation flags
  * @vma: user vma this address belongs to
  * @addr: target address for mempolicy
  * @entries: swap slots to consider reading
  * @nr_entries: number of @entries
- * @cluster: readahead window size in swap slots
+ * @cluster: readaround window size in swap slots
  *
  * Returns the struct page for entry and addr, after queueing swapin.
  *
  * This has been extended to use the NUMA policies from the mm
- * triggering the readahead.
+ * triggering the readaround.
  *
  * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
  */
-struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
+struct page *swapin_readaround(swp_entry_t entry, gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr,
 			swp_entry_t *entries, int nr_entries,
 			unsigned long cluster)
@@ -384,8 +350,6 @@ struct page *swapin_readahead(swp_entry_
 	unsigned long pmin, pmax;
 	int i;
 
-	if (!entries)	/* XXX: shmem case */
-		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
 	pmin = swp_offset(entry) & ~(cluster - 1);
 	pmax = pmin + cluster;
 	for (i = 0; i < nr_entries; i++) {
@@ -394,7 +358,7 @@ struct page *swapin_readahead(swp_entry_
 
 		if (swp_type(swp) != swp_type(entry))
 			continue;
-		if (swp_offset(swp) > pmax)
+		if (swp_offset(swp) >= pmax)
 			continue;
 		if (swp_offset(swp) < pmin)
 			continue;
--- 2.6.30-hv3/mm/swapfile.c	2009-03-23 23:12:14.000000000 +0000
+++ 2.6.30-hv4/mm/swapfile.c	2009-06-21 14:59:27.000000000 +0100
@@ -1984,55 +1984,3 @@ get_swap_info_struct(unsigned type)
 {
 	return &swap_info[type];
 }
-
-/*
- * swap_lock prevents swap_map being freed. Don't grab an extra
- * reference on the swaphandle, it doesn't matter if it becomes unused.
- */
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
-{
-	struct swap_info_struct *si;
-	int our_page_cluster = page_cluster;
-	pgoff_t target, toff;
-	pgoff_t base, end;
-	int nr_pages = 0;
-
-	if (!our_page_cluster)	/* no readahead */
-		return 0;
-
-	si = &swap_info[swp_type(entry)];
-	target = swp_offset(entry);
-	base = (target >> our_page_cluster) << our_page_cluster;
-	end = base + (1 << our_page_cluster);
-	if (!base)		/* first page is swap header */
-		base++;
-
-	spin_lock(&swap_lock);
-	if (end > si->max)	/* don't go beyond end of map */
-		end = si->max;
-
-	/* Count contiguous allocated slots above our target */
-	for (toff = target; ++toff < end; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (si->swap_map[toff] == SWAP_MAP_BAD)
-			break;
-	}
-	/* Count contiguous allocated slots below our target */
-	for (toff = target; --toff >= base; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (si->swap_map[toff] == SWAP_MAP_BAD)
-			break;
-	}
-	spin_unlock(&swap_lock);
-
-	/*
-	 * Indicate starting offset, and return number of pages to get:
-	 * if only 1, say 0, since there's then no readahead to be done.
-	 */
-	*offset = ++toff;
-	return nr_pages? ++nr_pages: 0;
-}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
  2009-06-21 18:07                                     ` Hugh Dickins
@ 2009-06-21 18:37                                       ` Johannes Weiner
  -1 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-21 18:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Wu Fengguang, Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Andrew Morton, Rik van Riel, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Sun, Jun 21, 2009 at 07:07:03PM +0100, Hugh Dickins wrote:
> Hi Hannes,
> 
> On Thu, 18 Jun 2009, Johannes Weiner wrote:
> > On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> > 
> > Okay, evaluating this test-patch any further probably isn't worth it.
> > It's too aggressive, I think readahead is stealing pages reclaimed by
> > other allocations which in turn oom.
> > 
> > Back to the original problem: you detected increased latency for
> > launching new applications, so they get less share of the IO bandwidth
> > than without the patch.
> > 
> > I can see two reasons for this:
> > 
> >   a) the new heuristics don't work out and we read more unrelated
> >   pages than before
> > 
> >   b) we readahead more pages in total as the old code would stop at
> >   holes, as described above
> > 
> > We can verify a) by comparing major fault numbers between the two
> > kernels with your testload.  If they increase with my patch, we
> > anticipate the wrong slots and every fault has do the reading itself.
> > 
> > b) seems to be a trade-off.  After all, the IO resources you have less
> > for new applications in your test is the bandwidth that is used by
> > swapping applications.  My qsbench numbers are a sign for this as the
> > only IO going on is swap.
> > 
> > Of course, the theory is not to improve swap performance by increasing
> > the readahead window but to choose better readahead candidates.  So I
> > will run your tests and qsbench with a smaller page cluster and see if
> > this improves both loads.
> 
> Hmm, sounds rather pessimistic; but I've not decided about it either.

It seems the problem was not that real after all:

	http://lkml.org/lkml/2009/6/18/109

> May I please hand over to you this collection of adjustments to your
> v3 virtual swap readahead patch, for you to merge in or split up or
> mess around with, generally take ownership of, however you wish?
> So you can keep adjusting shmem.c to match memory.c if necessary.

I will adopt them, thank you!

	Hannes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [patch v3] swap: virtual swap readahead
@ 2009-06-21 18:37                                       ` Johannes Weiner
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Weiner @ 2009-06-21 18:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Wu Fengguang, Barnes, Jesse, Peter Zijlstra, KAMEZAWA Hiroyuki,
	Andrew Morton, Rik van Riel, Andi Kleen, Minchan Kim, linux-mm,
	linux-kernel

On Sun, Jun 21, 2009 at 07:07:03PM +0100, Hugh Dickins wrote:
> Hi Hannes,
> 
> On Thu, 18 Jun 2009, Johannes Weiner wrote:
> > On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> > 
> > Okay, evaluating this test-patch any further probably isn't worth it.
> > It's too aggressive, I think readahead is stealing pages reclaimed by
> > other allocations which in turn oom.
> > 
> > Back to the original problem: you detected increased latency for
> > launching new applications, so they get less share of the IO bandwidth
> > than without the patch.
> > 
> > I can see two reasons for this:
> > 
> >   a) the new heuristics don't work out and we read more unrelated
> >   pages than before
> > 
> >   b) we readahead more pages in total as the old code would stop at
> >   holes, as described above
> > 
> > We can verify a) by comparing major fault numbers between the two
> > kernels with your testload.  If they increase with my patch, we
> > anticipate the wrong slots and every fault has do the reading itself.
> > 
> > b) seems to be a trade-off.  After all, the IO resources you have less
> > for new applications in your test is the bandwidth that is used by
> > swapping applications.  My qsbench numbers are a sign for this as the
> > only IO going on is swap.
> > 
> > Of course, the theory is not to improve swap performance by increasing
> > the readahead window but to choose better readahead candidates.  So I
> > will run your tests and qsbench with a smaller page cluster and see if
> > this improves both loads.
> 
> Hmm, sounds rather pessimistic; but I've not decided about it either.

It seems the problem was not that real after all:

	http://lkml.org/lkml/2009/6/18/109

> May I please hand over to you this collection of adjustments to your
> v3 virtual swap readahead patch, for you to merge in or split up or
> mess around with, generally take ownership of, however you wish?
> So you can keep adjusting shmem.c to match memory.c if necessary.

I will adopt them, thank you!

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2009-06-21 18:41 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-09 19:01 [patch v3] swap: virtual swap readahead Johannes Weiner
2009-06-09 19:01 ` Johannes Weiner
2009-06-09 19:37 ` Johannes Weiner
2009-06-09 19:37   ` Johannes Weiner
2009-06-10  5:03   ` Wu Fengguang
2009-06-10  5:03     ` Wu Fengguang
2009-06-10  7:45     ` Johannes Weiner
2009-06-10  7:45       ` Johannes Weiner
2009-06-10  8:11       ` Wu Fengguang
2009-06-10  8:11         ` Wu Fengguang
2009-06-10  8:32         ` KAMEZAWA Hiroyuki
2009-06-10  8:32           ` KAMEZAWA Hiroyuki
2009-06-10  8:56           ` Wu Fengguang
2009-06-10  8:56             ` Wu Fengguang
2009-06-10  9:42             ` Peter Zijlstra
2009-06-10  9:42               ` Peter Zijlstra
2009-06-10  9:59               ` Wu Fengguang
2009-06-10  9:59                 ` Wu Fengguang
2009-06-10 10:05                 ` Peter Zijlstra
2009-06-10 10:05                   ` Peter Zijlstra
2009-06-10 11:32                   ` Wu Fengguang
2009-06-10 11:32                     ` Wu Fengguang
2009-06-10 17:25                     ` Jesse Barnes
2009-06-10 17:25                       ` Jesse Barnes
2009-06-11  5:22                       ` Wu Fengguang
2009-06-11  5:22                         ` Wu Fengguang
2009-06-11 10:17                         ` Johannes Weiner
2009-06-11 10:17                           ` Johannes Weiner
2009-06-12  1:59                           ` Wu Fengguang
2009-06-12  1:59                             ` Wu Fengguang
2009-06-15 18:22                             ` Johannes Weiner
2009-06-15 18:22                               ` Johannes Weiner
2009-06-18  9:19                               ` Wu Fengguang
2009-06-18  9:19                                 ` Wu Fengguang
2009-06-18 13:01                                 ` Johannes Weiner
2009-06-18 13:01                                   ` Johannes Weiner
2009-06-19  3:30                                   ` Wu Fengguang
2009-06-19  3:30                                     ` Wu Fengguang
2009-06-21 18:07                                   ` Hugh Dickins
2009-06-21 18:07                                     ` Hugh Dickins
2009-06-21 18:37                                     ` Johannes Weiner
2009-06-21 18:37                                       ` Johannes Weiner
2009-06-10  9:30           ` Johannes Weiner
2009-06-10  9:30             ` Johannes Weiner
2009-06-10  6:39   ` KAMEZAWA Hiroyuki
2009-06-10  6:39     ` KAMEZAWA Hiroyuki
2009-06-11  5:31 ` KAMEZAWA Hiroyuki
2009-06-11  5:31   ` KAMEZAWA Hiroyuki
2009-06-17 22:41   ` Johannes Weiner
2009-06-17 22:41     ` Johannes Weiner
2009-06-18  9:29     ` Wu Fengguang
2009-06-18  9:29       ` Wu Fengguang
2009-06-18 13:09       ` Johannes Weiner
2009-06-18 13:09         ` Johannes Weiner
2009-06-19  3:17         ` Wu Fengguang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.