linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/14] filemap and readahead fixes
@ 2009-04-07  7:17 Wu Fengguang
  2009-04-07  7:17 ` [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing Wu Fengguang
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ying Han, LKML, linux-fsdevel, linux-mm

Andrew,

This is a set of fixes and cleanups for filemap and readahead.
They are for 2.6.29-rc8-mm1 and have been carefully tested.

filemap VM_FAULT_RETRY fixes
----------------------------
        [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing
        [PATCH 02/14] mm: fix major/minor fault accounting on retried fault
        [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
        [PATCH 04/14] mm: reduce duplicate page fault code
        [PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY

readahead fixes
---------------
minor cleanups:
        [PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead()
        [PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead()
        [PATCH 08/14] readahead: remove one unnecessary radix tree lookup

behavior changes necessary for the following mmap readahead:
        [PATCH 09/14] readahead: increase interleaved readahead size
        [PATCH 10/14] readahead: remove sync/async readahead call dependency

mmap readaround/readahead
-------------------------
major cleanups from Linus:
(the cleanups automatically fix a PGMAJFAULT accounting bug in VM_RAND_READ case)
        [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead

and my further steps:
        [PATCH 12/14] readahead: sequential mmap readahead
        [PATCH 13/14] readahead: enforce full readahead size on async mmap readahead
        [PATCH 14/14] readahead: record mmap read-around states in file_ra_state

Thanks,
Fengguang
-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Wu Fengguang
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Wu Fengguang

[-- Attachment #1: filemap-fault-fix.patch --]
[-- Type: text/plain, Size: 2136 bytes --]

find_lock_page_retry() won't touch the *ppage value when returning
VM_FAULT_RETRY. So in the case of filemap_fault():no_cached_page,
the 'page' could be undefined after calling find_lock_page_retry().

Fix it by checking the VM_FAULT_RETRY case first.

Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/filemap.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -759,7 +759,7 @@ EXPORT_SYMBOL(find_lock_page);
  * @retry: 1 indicate caller tolerate a retry.
  *
  * If retry flag is on, and page is already locked by someone else, return
- * a hint of retry.
+ * a hint of retry and leave *ppage untouched.
  *
  * Return *ppage==NULL if page is not in pagecache. Otherwise return *ppage
  * points to the page in the pagecache with ret=VM_FAULT_RETRY indicate a
@@ -1575,10 +1575,10 @@ retry_find_nopage:
 							   vmf->pgoff, 1);
 			retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
 						vma, &page, retry_flag);
-			if (!page)
-				goto no_cached_page;
 			if (retry_ret == VM_FAULT_RETRY)
 				return retry_ret;
+			if (!page)
+				goto no_cached_page;
 		}
 		if (PageReadahead(page)) {
 			page_cache_async_readahead(mapping, ra, file, page,
@@ -1617,10 +1617,10 @@ retry_find_nopage:
 		}
 		retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
 				vma, &page, retry_flag);
-		if (!page)
-			goto no_cached_page;
 		if (retry_ret == VM_FAULT_RETRY)
 			return retry_ret;
+		if (!page)
+			goto no_cached_page;
 	}
 
 	if (!did_readaround)
@@ -1672,10 +1672,10 @@ no_cached_page:
 
 		retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
 					vma, &page, retry_flag);
+		if (retry_ret == VM_FAULT_RETRY)
+			return retry_ret;
 		if (!page)
 			goto retry_find_nopage;
-		else if (retry_ret == VM_FAULT_RETRY)
-			return retry_ret;
 		else
 			goto retry_page_update;
 	}

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 02/14] mm: fix major/minor fault accounting on retried fault
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
  2009-04-07  7:17 ` [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07 19:58   ` Ying Han
  2009-04-07  7:17 ` [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code Wu Fengguang
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Wu Fengguang

[-- Attachment #1: filemap-major-fault-retry.patch --]
[-- Type: text/plain, Size: 1783 bytes --]

VM_FAULT_RETRY does make major/minor faults accounting a bit twisted..

Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 arch/x86/mm/fault.c |    2 ++
 mm/memory.c         |   22 ++++++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

--- mm.orig/arch/x86/mm/fault.c
+++ mm/arch/x86/mm/fault.c
@@ -1160,6 +1160,8 @@ good_area:
 	if (fault & VM_FAULT_RETRY) {
 		if (retry_flag) {
 			retry_flag = 0;
+			tsk->maj_flt++;
+			tsk->min_flt--;
 			goto retry;
 		}
 		BUG();
--- mm.orig/mm/memory.c
+++ mm/mm/memory.c
@@ -2882,26 +2882,32 @@ int handle_mm_fault(struct mm_struct *mm
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
+	int ret;
 
 	__set_current_state(TASK_RUNNING);
 
-	count_vm_event(PGFAULT);
-
-	if (unlikely(is_vm_hugetlb_page(vma)))
-		return hugetlb_fault(mm, vma, address, write_access);
+	if (unlikely(is_vm_hugetlb_page(vma))) {
+		ret = hugetlb_fault(mm, vma, address, write_access);
+		goto out;
+	}
 
+	ret = VM_FAULT_OOM;
 	pgd = pgd_offset(mm, address);
 	pud = pud_alloc(mm, pgd, address);
 	if (!pud)
-		return VM_FAULT_OOM;
+		goto out;
 	pmd = pmd_alloc(mm, pud, address);
 	if (!pmd)
-		return VM_FAULT_OOM;
+		goto out;
 	pte = pte_alloc_map(mm, pmd, address);
 	if (!pte)
-		return VM_FAULT_OOM;
+		goto out;
 
-	return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
+	ret = handle_pte_fault(mm, vma, address, pte, pmd, write_access);
+out:
+	if (!(ret & VM_FAULT_RETRY))
+		count_vm_event(PGFAULT);
+	return ret;
 }
 
 #ifndef __PAGETABLE_PUD_FOLDED

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
  2009-04-07  7:17 ` [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing Wu Fengguang
  2009-04-07  7:17 ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07 20:03   ` Ying Han
  2009-04-07  7:17 ` [PATCH 04/14] mm: reduce duplicate page fault code Wu Fengguang
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Wu Fengguang

[-- Attachment #1: memory-fault-retry-simp.patch --]
[-- Type: text/plain, Size: 910 bytes --]

Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/memory.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- mm.orig/mm/memory.c
+++ mm/mm/memory.c
@@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
 {
 	pgoff_t pgoff = (((address & PAGE_MASK)
 			- vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
-	int write = write_access & ~FAULT_FLAG_RETRY;
-	unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
 
-	flags |= (write_access & FAULT_FLAG_RETRY);
 	pte_unmap(page_table);
 	return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
 }

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 04/14] mm: reduce duplicate page fault code
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (2 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY Wu Fengguang
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Wu Fengguang

[-- Attachment #1: filemap-fault-cleanup.patch --]
[-- Type: text/plain, Size: 2155 bytes --]

Restore the simplicity of the filemap_fault():no_cached_page block.
The VM_FAULT_RETRY case is not all that different.

No readahead/readaround will be performed after no_cached_page,
because no_cached_page either means MADV_RANDOM or some error condition.

Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/filemap.c |   22 +++-------------------
 1 file changed, 3 insertions(+), 19 deletions(-)

--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1565,7 +1565,6 @@ int filemap_fault(struct vm_area_struct 
 retry_find:
 	page = find_lock_page(mapping, vmf->pgoff);
 
-retry_find_nopage:
 	/*
 	 * For sequential accesses, we use the generic readahead logic.
 	 */
@@ -1615,6 +1614,7 @@ retry_find_nopage:
 				start = vmf->pgoff - ra_pages / 2;
 			do_page_cache_readahead(mapping, file, start, ra_pages);
 		}
+retry_find_retry:
 		retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
 				vma, &page, retry_flag);
 		if (retry_ret == VM_FAULT_RETRY)
@@ -1626,7 +1626,6 @@ retry_find_nopage:
 	if (!did_readaround)
 		ra->mmap_miss--;
 
-retry_page_update:
 	/*
 	 * We have a locked page in the page cache, now we need to check
 	 * that it's up-to-date. If not, it is going to be due to an error.
@@ -1662,23 +1661,8 @@ no_cached_page:
 	 * In the unlikely event that someone removed it in the
 	 * meantime, we'll just come back here and read it again.
 	 */
-	if (error >= 0) {
-		/*
-		 * If caller cannot tolerate a retry in the ->fault path
-		 * go back to check the page again.
-		 */
-		if (!retry_flag)
-			goto retry_find;
-
-		retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
-					vma, &page, retry_flag);
-		if (retry_ret == VM_FAULT_RETRY)
-			return retry_ret;
-		if (!page)
-			goto retry_find_nopage;
-		else
-			goto retry_page_update;
-	}
+	if (error >= 0)
+		goto retry_find_retry;
 
 	/*
 	 * An error return from page_cache_read can result if the

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (3 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 04/14] mm: reduce duplicate page fault code Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead() Wu Fengguang
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Wu Fengguang

[-- Attachment #1: readahead-mmap_miss-retry.patch --]
[-- Type: text/plain, Size: 2161 bytes --]

The VM_FAULT_RETRY case introduced a performance bug that leads to
excessive/unconditional mmap readarounds for wild random mmap reads.

A retried page fault means a mmap readahead miss(mmap_miss++) followed by
a hit(mmap_miss--) on the same page. This sticks mmap_miss, and thus stops
mmap readaround from being turned off for wild random reads. Fix it by an
extra mmap_miss increament in order to counteract the followed mmap hit.

Also make mmap_miss a more robust 'unsigned int', so that if ever mmap_miss
goes out of range, it only create _temporary_ performance impacts.

Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/fs.h |    2 +-
 mm/filemap.c       |    8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1574,8 +1574,10 @@ retry_find:
 							   vmf->pgoff, 1);
 			retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
 						vma, &page, retry_flag);
-			if (retry_ret == VM_FAULT_RETRY)
+			if (retry_ret == VM_FAULT_RETRY) {
+				ra->mmap_miss++; /* counteract the followed retry hit */
 				return retry_ret;
+			}
 			if (!page)
 				goto no_cached_page;
 		}
@@ -1617,8 +1619,10 @@ retry_find:
 retry_find_retry:
 		retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
 				vma, &page, retry_flag);
-		if (retry_ret == VM_FAULT_RETRY)
+		if (retry_ret == VM_FAULT_RETRY) {
+			ra->mmap_miss++; /* counteract the followed retry hit */
 			return retry_ret;
+		}
 		if (!page)
 			goto no_cached_page;
 	}
--- mm.orig/include/linux/fs.h
+++ mm/include/linux/fs.h
@@ -824,7 +824,7 @@ struct file_ra_state {
 					   there are only # of pages ahead */
 
 	unsigned int ra_pages;		/* Maximum readahead window */
-	int mmap_miss;			/* Cache miss stat for mmap accesses */
+	unsigned int mmap_miss;		/* Cache miss stat for mmap accesses */
 	loff_t prev_pos;		/* Cache last read() position */
 };
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead()
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (4 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead() Wu Fengguang
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Nick Piggin,
	Linus Torvalds, Wu Fengguang

[-- Attachment #1: readahead-move-max_sane_readahead.patch --]
[-- Type: text/plain, Size: 1843 bytes --]

Impact: code simplification.

Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/fadvise.c   |    2 +-
 mm/filemap.c   |    3 +--
 mm/madvise.c   |    3 +--
 mm/readahead.c |    1 +
 4 files changed, 4 insertions(+), 5 deletions(-)

--- mm.orig/mm/fadvise.c
+++ mm/mm/fadvise.c
@@ -101,7 +101,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, lof
 		
 		ret = force_page_cache_readahead(mapping, file,
 				start_index,
-				max_sane_readahead(nrpages));
+				nrpages);
 		if (ret > 0)
 			ret = 0;
 		break;
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1458,8 +1458,7 @@ do_readahead(struct address_space *mappi
 	if (!mapping || !mapping->a_ops || !mapping->a_ops->readpage)
 		return -EINVAL;
 
-	force_page_cache_readahead(mapping, filp, index,
-					max_sane_readahead(nr));
+	force_page_cache_readahead(mapping, filp, index, nr);
 	return 0;
 }
 
--- mm.orig/mm/madvise.c
+++ mm/mm/madvise.c
@@ -123,8 +123,7 @@ static long madvise_willneed(struct vm_a
 		end = vma->vm_end;
 	end = ((end - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
 
-	force_page_cache_readahead(file->f_mapping,
-			file, start, max_sane_readahead(end - start));
+	force_page_cache_readahead(file->f_mapping, file, start, end - start);
 	return 0;
 }
 
--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -223,6 +223,7 @@ int force_page_cache_readahead(struct ad
 	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
 		return -EINVAL;
 
+	nr_to_read = max_sane_readahead(nr_to_read);
 	while (nr_to_read) {
 		int err;
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead()
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (5 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead() Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 08/14] readahead: remove one unnecessary radix tree lookup Wu Fengguang
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Nick Piggin,
	Linus Torvalds, Wu Fengguang

[-- Attachment #1: readahead-sane-max.patch --]
[-- Type: text/plain, Size: 852 bytes --]

Just in case someone aggressively set a huge readahead size.

Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/readahead.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -382,7 +382,7 @@ ondemand_readahead(struct address_space 
 		   bool hit_readahead_marker, pgoff_t offset,
 		   unsigned long req_size)
 {
-	int	max = ra->ra_pages;	/* max readahead pages */
+	unsigned long max = max_sane_readahead(ra->ra_pages);
 	pgoff_t prev_offset;
 	int	sequential;
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 08/14] readahead: remove one unnecessary radix tree lookup
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (6 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead() Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 09/14] readahead: increase interleaved readahead size Wu Fengguang
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Wu Fengguang

[-- Attachment #1: readahead-interleaved-offset.patch --]
[-- Type: text/plain, Size: 884 bytes --]

(hit_readahead_marker != 0) means the page at @offset is present,
so we can search for non-present page starting from @offset+1.

Reported-by: Xu Chenfeng <xcf@ustc.edu.cn>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/readahead.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -420,7 +420,7 @@ ondemand_readahead(struct address_space 
 		pgoff_t start;
 
 		rcu_read_lock();
-		start = radix_tree_next_hole(&mapping->page_tree, offset,max+1);
+		start = radix_tree_next_hole(&mapping->page_tree, offset+1,max);
 		rcu_read_unlock();
 
 		if (!start || start - offset > max)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 09/14] readahead: increase interleaved readahead size
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (7 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 08/14] readahead: remove one unnecessary radix tree lookup Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 10/14] readahead: remove sync/async readahead call dependency Wu Fengguang
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Wu Fengguang

[-- Attachment #1: readahead-interleaved-size.patch --]
[-- Type: text/plain, Size: 808 bytes --]

Make sure interleaved readahead size is larger than request size.
This also makes readahead window grow up more quickly.

Reported-by: Xu Chenfeng <xcf@ustc.edu.cn>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/readahead.c |    1 +
 1 file changed, 1 insertion(+)

--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -428,6 +428,7 @@ ondemand_readahead(struct address_space 
 
 		ra->start = start;
 		ra->size = start - offset;	/* old async_size */
+		ra->size += req_size;
 		ra->size = get_next_ra_size(ra, max);
 		ra->async_size = ra->size;
 		goto readit;

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 10/14] readahead: remove sync/async readahead call dependency
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (8 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 09/14] readahead: increase interleaved readahead size Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Nick Piggin,
	Linus Torvalds, Wu Fengguang

[-- Attachment #1: readahead-remove-call-dependancy.patch --]
[-- Type: text/plain, Size: 2013 bytes --]

The readahead call scheme is error-prone in that it expects on the call sites
to check for async readahead after doing any sync one. I.e. 

			if (!page)
				page_cache_sync_readahead();
			page = find_get_page();
			if (page && PageReadahead(page))
				page_cache_async_readahead();

This is because PG_readahead could be set by a sync readahead for the _current_
newly faulted in page, and the readahead code simply expects one more callback
on the same page to start the async readahead. If the caller fails to do so, it
will miss the PG_readahead bits and never able to start an async readahead.

Eliminate this insane constraint by piggy-backing the async part into the
current readahead window.

Now if an async readahead should be started immediately after a sync one,
the readahead logic itself will do it. So the following code becomes valid:
(the 'else' in particular)

			if (!page)
				page_cache_sync_readahead();
			else if (PageReadahead(page))
				page_cache_async_readahead();

Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/readahead.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -446,6 +446,16 @@ ondemand_readahead(struct address_space 
 	ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
 
 readit:
+	/*
+	 * Will this read hit the readahead marker made by itself?
+	 * If so, trigger the readahead marker hit now, and merge
+	 * the resulted next readahead window into the current one.
+	 */
+	if (offset == ra->start && ra->size == ra->async_size) {
+		ra->async_size = get_next_ra_size(ra, max);
+		ra->size += ra->async_size;
+	}
+
 	return ra_submit(ra, mapping, filp);
 }
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (9 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 10/14] readahead: remove sync/async readahead call dependency Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 12/14] readahead: sequential mmap readahead Wu Fengguang
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Pavel Levshin, wli,
	Nick Piggin, Wu Fengguang, Linus Torvalds

[-- Attachment #1: readahead-mmap-split-code-and-cleanup.patch --]
[-- Type: text/plain, Size: 9841 bytes --]

From: Linus Torvalds <torvalds@linux-foundation.org>

This shouldn't really change behavior all that much, but the single
rather complex function with read-ahead inside a loop etc is broken up
into more manageable pieces.

The behaviour is also less subtle, with the read-ahead being done up-front 
rather than inside some subtle loop and thus avoiding the now unnecessary 
extra state variables (ie "did_readaround" is gone).

Fengguang: the code split in fact fixed a bug reported by Pavel Levshin:
the PGMAJFAULT accounting used to be bypassed when MADV_RANDOM is set, in
which case the original code will directly jump to no_cached_page reading.

Cc: Pavel Levshin <lpk@581.spb.su>
Cc: wli@movementarian.org
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Ok, so this is something I did in Mexico when I wasn't scuba-diving, and 
was "watching" the kids at the pool. It was brought on by looking at git 
mmap file behaviour under cold-cache behaviour: git does ok, but my laptop 
disk is really slow, and I tried to verify that the kernel did a 
reasonable job of read-ahead when taking page faults.

I think it did, but quite frankly, the filemap_fault() code was totally 
unreadable. So this separates out the read-ahead cases, and adds more 
comments, and also changes it so that we do asynchronous read-ahead 
*before* we actually wait for the page we are waiting for to become 
unlocked.

Not that it seems to make any real difference on my laptop, but I really 
hated how it was doing a

	page = get_lock_page(..)

and then doing read-ahead after that: which just guarantees that we have 
to wait for any out-standing IO on "page" to complete before we can even 
submit any new read-ahead! That just seems totally broken!

So it replaces the "get_lock_page()" at the top with a broken-out page 
cache lookup, which allows us to look at the page state flags and make 
appropriate decisions on what we should do without waiting for the locked 
bit to clear.

It does add many more lines than it removes:

	 mm/filemap.c |  192 +++++++++++++++++++++++++++++++++++++++-------------------
	 1 files changed, 130 insertions(+), 62 deletions(-)

but that's largely due to (a) the new function headers etc due to the 
split-up and (b) new or extended comments especially about the helper 
functions. The code, in many ways, is actually simpler, apart from the 
fairly trivial expansion of the equivalent of "get_lock_page()" into the 
function.

Comments? I tried to avoid changing the read-ahead logic itself, although 
the old code did some strange things like doing *both* async readahead and 
then looking up the page and doing sync readahead (which I think was just 
due to the code being so damn messily organized, not on purpose).

			Linus

---
 mm/filemap.c |  164 ++++++++++++++++++++++++++-----------------------
 1 file changed, 90 insertions(+), 74 deletions(-)

--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1524,6 +1524,68 @@ static int page_cache_read(struct file *
 
 #define MMAP_LOTSAMISS  (100)
 
+/*
+ * Synchronous readahead happens when we don't even find
+ * a page in the page cache at all.
+ */
+static void do_sync_mmap_readahead(struct vm_area_struct *vma,
+				   struct file_ra_state *ra,
+				   struct file *file,
+				   pgoff_t offset)
+{
+	unsigned long ra_pages;
+	struct address_space *mapping = file->f_mapping;
+
+	/* If we don't want any read-ahead, don't bother */
+	if (VM_RandomReadHint(vma))
+		return;
+
+	if (VM_SequentialReadHint(vma)) {
+		page_cache_sync_readahead(mapping, ra, file, offset, 1);
+		return;
+	}
+
+	if (ra->mmap_miss < INT_MAX)
+		ra->mmap_miss++;
+
+	/*
+	 * Do we miss much more than hit in this file? If so,
+	 * stop bothering with read-ahead. It will only hurt.
+	 */
+	if (ra->mmap_miss > MMAP_LOTSAMISS)
+		return;
+
+	ra_pages = max_sane_readahead(ra->ra_pages);
+	if (ra_pages) {
+		pgoff_t start = 0;
+
+		if (offset > ra_pages / 2)
+			start = offset - ra_pages / 2;
+		do_page_cache_readahead(mapping, file, start, ra_pages);
+	}
+}
+
+/*
+ * Asynchronous readahead happens when we find the page and PG_readahead,
+ * so we want to possibly extend the readahead further..
+ */
+static void do_async_mmap_readahead(struct vm_area_struct *vma,
+				    struct file_ra_state *ra,
+				    struct file *file,
+				    struct page *page,
+				    pgoff_t offset)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	/* If we don't want any read-ahead, don't bother */
+	if (VM_RandomReadHint(vma))
+		return;
+	if (ra->mmap_miss > 0)
+		ra->mmap_miss--;
+	if (PageReadahead(page))
+		page_cache_async_readahead(mapping, ra, file, page, offset, 1);
+}
+
 /**
  * filemap_fault - read in file data for page fault handling
  * @vma:	vma in which the fault was taken
@@ -1543,80 +1605,43 @@ int filemap_fault(struct vm_area_struct 
 	struct address_space *mapping = file->f_mapping;
 	struct file_ra_state *ra = &file->f_ra;
 	struct inode *inode = mapping->host;
+	pgoff_t offset = vmf->pgoff;
 	struct page *page;
 	pgoff_t size;
-	int did_readaround = 0;
 	int ret = 0;
 	int retry_flag = vmf->flags & FAULT_FLAG_RETRY;
 	int retry_ret;
 
 	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	if (vmf->pgoff >= size)
+	if (offset >= size)
 		return VM_FAULT_SIGBUS;
 
-	/* If we don't want any read-ahead, don't bother */
-	if (VM_RandomReadHint(vma))
-		goto no_cached_page;
-
 	/*
 	 * Do we have something in the page cache already?
 	 */
-retry_find:
-	page = find_lock_page(mapping, vmf->pgoff);
-
-	/*
-	 * For sequential accesses, we use the generic readahead logic.
-	 */
-	if (VM_SequentialReadHint(vma)) {
-		if (!page) {
-			page_cache_sync_readahead(mapping, ra, file,
-							   vmf->pgoff, 1);
-			retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
-						vma, &page, retry_flag);
-			if (retry_ret == VM_FAULT_RETRY) {
-				ra->mmap_miss++; /* counteract the followed retry hit */
-				return retry_ret;
-			}
-			if (!page)
-				goto no_cached_page;
-		}
-		if (PageReadahead(page)) {
-			page_cache_async_readahead(mapping, ra, file, page,
-							   vmf->pgoff, 1);
-		}
-	}
-
-	if (!page) {
-		unsigned long ra_pages;
-
-		ra->mmap_miss++;
+	page = find_get_page(mapping, offset);
 
+	if (likely(page)) {
 		/*
-		 * Do we miss much more than hit in this file? If so,
-		 * stop bothering with read-ahead. It will only hurt.
+		 * We found the page, so try async readahead before
+		 * waiting for the lock.
 		 */
-		if (ra->mmap_miss > MMAP_LOTSAMISS)
-			goto no_cached_page;
+		do_async_mmap_readahead(vma, ra, file, page, offset);
+		lock_page(page);
 
-		/*
-		 * To keep the pgmajfault counter straight, we need to
-		 * check did_readaround, as this is an inner loop.
-		 */
-		if (!did_readaround) {
-			ret = VM_FAULT_MAJOR;
-			count_vm_event(PGMAJFAULT);
-		}
-		did_readaround = 1;
-		ra_pages = max_sane_readahead(file->f_ra.ra_pages);
-		if (ra_pages) {
-			pgoff_t start = 0;
-
-			if (vmf->pgoff > ra_pages / 2)
-				start = vmf->pgoff - ra_pages / 2;
-			do_page_cache_readahead(mapping, file, start, ra_pages);
+		/* Did it get truncated? */
+		if (unlikely(page->mapping != mapping)) {
+			unlock_page(page);
+			put_page(page);
+			goto no_cached_page;
 		}
-retry_find_retry:
-		retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
+	} else {
+		/* No page in the page cache at all */
+		do_sync_mmap_readahead(vma, ra, file, offset);
+		count_vm_event(PGMAJFAULT);
+		ret = VM_FAULT_MAJOR;
+retry_find:
+		retry_ret = find_lock_page_retry(mapping, offset,
 				vma, &page, retry_flag);
 		if (retry_ret == VM_FAULT_RETRY) {
 			ra->mmap_miss++; /* counteract the followed retry hit */
@@ -1626,9 +1651,6 @@ retry_find_retry:
 			goto no_cached_page;
 	}
 
-	if (!did_readaround)
-		ra->mmap_miss--;
-
 	/*
 	 * We have a locked page in the page cache, now we need to check
 	 * that it's up-to-date. If not, it is going to be due to an error.
@@ -1636,19 +1658,19 @@ retry_find_retry:
 	if (unlikely(!PageUptodate(page)))
 		goto page_not_uptodate;
 
-	/* Must recheck i_size under page lock */
+	/*
+	 * Found the page and have a reference on it.
+	 * We must recheck i_size under page lock.
+	 */
 	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	if (unlikely(vmf->pgoff >= size)) {
+	if (unlikely(offset >= size)) {
 		unlock_page(page);
 		page_cache_release(page);
 		return VM_FAULT_SIGBUS;
 	}
 
-	/*
-	 * Found the page and have a reference on it.
-	 */
 	update_page_reclaim_stat(page);
-	ra->prev_pos = (loff_t)page->index << PAGE_CACHE_SHIFT;
+	ra->prev_pos = (loff_t)offset << PAGE_CACHE_SHIFT;
 	vmf->page = page;
 	return ret | VM_FAULT_LOCKED;
 
@@ -1657,7 +1679,7 @@ no_cached_page:
 	 * We're only likely to ever get here if MADV_RANDOM is in
 	 * effect.
 	 */
-	error = page_cache_read(file, vmf->pgoff);
+	error = page_cache_read(file, offset);
 
 	/*
 	 * The page we want has now been added to the page cache.
@@ -1665,7 +1687,7 @@ no_cached_page:
 	 * meantime, we'll just come back here and read it again.
 	 */
 	if (error >= 0)
-		goto retry_find_retry;
+		goto retry_find;
 
 	/*
 	 * An error return from page_cache_read can result if the
@@ -1677,12 +1699,6 @@ no_cached_page:
 	return VM_FAULT_SIGBUS;
 
 page_not_uptodate:
-	/* IO error path */
-	if (!did_readaround) {
-		ret = VM_FAULT_MAJOR;
-		count_vm_event(PGMAJFAULT);
-	}
-
 	/*
 	 * Umm, take care of errors if the page isn't up-to-date.
 	 * Try to re-read it _once_. We do this synchronously,

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 12/14] readahead: sequential mmap readahead
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (10 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 13/14] readahead: enforce full readahead size on async " Wu Fengguang
  2009-04-07  7:17 ` [PATCH 14/14] readahead: record mmap read-around states in file_ra_state Wu Fengguang
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Nick Piggin,
	Linus Torvalds, Wu Fengguang

[-- Attachment #1: readahead-mmap-sequential-readahead.patch --]
[-- Type: text/plain, Size: 2021 bytes --]

Auto-detect sequential mmap reads and do readahead for them.

The sequential mmap readahead will be triggered when
- sync readahead: it's a major fault and (prev_offset == offset-1);
- async readahead: minor fault on PG_readahead page with valid readahead state.

The benefits of doing readahead instead of read-around:
- less I/O wait thanks to async readahead
- double real I/O size and no more cache hits

The single stream case is improved a little.
For 100,000 sequential mmap reads:

                                    user       system    cpu        total
(1-1)  plain -mm, 128KB readaround: 3.224      2.554     48.40%     11.838
(1-2)  plain -mm, 256KB readaround: 3.170      2.392     46.20%     11.976
(2)  patched -mm, 128KB readahead:  3.117      2.448     47.33%     11.607

The patched (2) has smallest total time, since it has no cache hit overheads
and less I/O block time(thanks to async readahead). Here the I/O size
makes no much difference, since there's only one single stream.

Note that (1-1)'s real I/O size is 64KB and (1-2)'s real I/O size is 128KB,
since the half of the read-around pages will be readahead cache hits.

This is going to make _real_ differences for _concurrent_ IO streams.

Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/filemap.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1540,7 +1540,8 @@ static void do_sync_mmap_readahead(struc
 	if (VM_RandomReadHint(vma))
 		return;
 
-	if (VM_SequentialReadHint(vma)) {
+	if (VM_SequentialReadHint(vma) ||
+			offset - 1 == (ra->prev_pos >> PAGE_CACHE_SHIFT)) {
 		page_cache_sync_readahead(mapping, ra, file, offset, 1);
 		return;
 	}

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 13/14] readahead: enforce full readahead size on async mmap readahead
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (11 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 12/14] readahead: sequential mmap readahead Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  2009-04-07  7:17 ` [PATCH 14/14] readahead: record mmap read-around states in file_ra_state Wu Fengguang
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Linus Torvalds,
	Nick Piggin, Wu Fengguang

[-- Attachment #1: readahead-mmap-full-async-readahead-size.patch --]
[-- Type: text/plain, Size: 2711 bytes --]

We need this in one perticular case and two more general ones.

Now we do async readahead for sequential mmap reads, and do it with the help of
PG_readahead. For normal reads, PG_readahead is the sufficient condition to do
a sequential readahead. But unfortunately, for mmap reads, there is a tiny nuisance:

[11736.998347] readahead-init0(process: sh/23926, file: sda1/w3m, offset=0:4503599627370495, ra=0+4-3) = 4
[11737.014985] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=290+32-0) = 17
[11737.019488] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=118+32-0) = 32
[11737.024921] readahead-interleaved(process: w3m/23926, file: sda1/w3m, offset=0:2, ra=4+6-6) = 6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                 ~~~~~~~~~~~~~
An unfavorably small readahead. The original dumb read-around size could be more efficient.

That happened because ld-linux.so does a read(832) in L1 before mmap(),
which triggers a 4-page readahead, with the second page tagged PG_readahead.

L0: open("/lib/libc.so.6", O_RDONLY)        = 3
L1: read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\342"..., 832) = 832
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L2: fstat(3, {st_mode=S_IFREG|0755, st_size=1420624, ...}) = 0
L3: mmap(NULL, 3527256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac6e51d000
L4: mprotect(0x7fac6e671000, 2097152, PROT_NONE) = 0
L5: mmap(0x7fac6e871000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x154000) = 0x7fac6e871000
L6: mmap(0x7fac6e876000, 16984, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac6e876000
L7: close(3)                                = 0

In general, the PG_readahead flag will also be hit in cases
- sequential reads
- clustered random reads
A full readahead size is desirable in both cases.

Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/filemap.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1584,7 +1584,8 @@ static void do_async_mmap_readahead(stru
 	if (ra->mmap_miss > 0)
 		ra->mmap_miss--;
 	if (PageReadahead(page))
-		page_cache_async_readahead(mapping, ra, file, page, offset, 1);
+		page_cache_async_readahead(mapping, ra, file,
+					   page, offset, ra->ra_pages);
 }
 
 /**

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 14/14] readahead: record mmap read-around states in file_ra_state
  2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
                   ` (12 preceding siblings ...)
  2009-04-07  7:17 ` [PATCH 13/14] readahead: enforce full readahead size on async " Wu Fengguang
@ 2009-04-07  7:17 ` Wu Fengguang
  13 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ying Han, LKML, linux-fsdevel, linux-mm, Nick Piggin, Linus Torvalds

[-- Attachment #1: readahead-mmap-readaround-use-ra_submit.patch --]
[-- Type: text/plain, Size: 4123 bytes --]

Mmap read-around now shares the same code style and data structure
with readahead code.

This also removes do_page_cache_readahead().
Its last user, mmap read-around, has been changed to call ra_submit().

The no-readahead-if-congested logic is dumped by the way.
Users will be pretty sensitive about the slow loading of executables.
So it's unfavorable to disabled mmap read-around on a congested queue.

Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 include/linux/mm.h |    5 +++--
 mm/filemap.c       |   12 +++++++-----
 mm/readahead.c     |   23 ++---------------------
 3 files changed, 12 insertions(+), 28 deletions(-)

--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1556,13 +1556,15 @@ static void do_sync_mmap_readahead(struc
 	if (ra->mmap_miss > MMAP_LOTSAMISS)
 		return;
 
+	/*
+	 * mmap read-around
+	 */
 	ra_pages = max_sane_readahead(ra->ra_pages);
 	if (ra_pages) {
-		pgoff_t start = 0;
-
-		if (offset > ra_pages / 2)
-			start = offset - ra_pages / 2;
-		do_page_cache_readahead(mapping, file, start, ra_pages);
+		ra->start = max_t(long, 0, offset - ra_pages/2);
+		ra->size = ra_pages;
+		ra->async_size = 0;
+		ra_submit(ra, mapping, file);
 	}
 }
 
--- mm.orig/include/linux/mm.h
+++ mm/include/linux/mm.h
@@ -1183,8 +1183,6 @@ void task_dirty_inc(struct task_struct *
 #define VM_MAX_READAHEAD	128	/* kbytes */
 #define VM_MIN_READAHEAD	16	/* kbytes (includes current page) */
 
-int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
-			pgoff_t offset, unsigned long nr_to_read);
 int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
 			pgoff_t offset, unsigned long nr_to_read);
 
@@ -1202,6 +1200,9 @@ void page_cache_async_readahead(struct a
 				unsigned long size);
 
 unsigned long max_sane_readahead(unsigned long nr);
+unsigned long ra_submit(struct file_ra_state *ra,
+		        struct address_space *mapping,
+			struct file *filp);
 
 /* Do stack extension */
 extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -146,15 +146,12 @@ out:
 }
 
 /*
- * do_page_cache_readahead actually reads a chunk of disk.  It allocates all
+ * __do_page_cache_readahead() actually reads a chunk of disk.  It allocates all
  * the pages first, then submits them all for I/O. This avoids the very bad
  * behaviour which would occur if page allocations are causing VM writeback.
  * We really don't want to intermingle reads and writes like that.
  *
  * Returns the number of pages requested, or the maximum amount of I/O allowed.
- *
- * do_page_cache_readahead() returns -1 if it encountered request queue
- * congestion.
  */
 static int
 __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
@@ -245,22 +242,6 @@ int force_page_cache_readahead(struct ad
 }
 
 /*
- * This version skips the IO if the queue is read-congested, and will tell the
- * block layer to abandon the readahead if request allocation would block.
- *
- * force_page_cache_readahead() will ignore queue congestion and will block on
- * request queues.
- */
-int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
-			pgoff_t offset, unsigned long nr_to_read)
-{
-	if (bdi_read_congested(mapping->backing_dev_info))
-		return -1;
-
-	return __do_page_cache_readahead(mapping, filp, offset, nr_to_read, 0);
-}
-
-/*
  * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
  * sensible upper limit.
  */
@@ -285,7 +266,7 @@ subsys_initcall(readahead_init);
 /*
  * Submit IO for the read-ahead request in file_ra_state.
  */
-static unsigned long ra_submit(struct file_ra_state *ra,
+unsigned long ra_submit(struct file_ra_state *ra,
 		       struct address_space *mapping, struct file *filp)
 {
 	int actual;

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/14] mm: fix major/minor fault accounting on retried fault
  2009-04-07  7:17 ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Wu Fengguang
@ 2009-04-07 19:58   ` Ying Han
  2009-04-07 22:45     ` Wu Fengguang
  0 siblings, 1 reply; 23+ messages in thread
From: Ying Han @ 2009-04-07 19:58 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Andrew Morton, LKML, linux-fsdevel, linux-mm

On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> VM_FAULT_RETRY does make major/minor faults accounting a bit twisted..
>
> Cc: Ying Han <yinghan@google.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  arch/x86/mm/fault.c |    2 ++
>  mm/memory.c         |   22 ++++++++++++++--------
>  2 files changed, 16 insertions(+), 8 deletions(-)
>
> --- mm.orig/arch/x86/mm/fault.c
> +++ mm/arch/x86/mm/fault.c
> @@ -1160,6 +1160,8 @@ good_area:
>        if (fault & VM_FAULT_RETRY) {
>                if (retry_flag) {
>                        retry_flag = 0;
> +                       tsk->maj_flt++;
> +                       tsk->min_flt--;
>                        goto retry;
>                }
>                BUG();
sorry, little bit confuse here. are we assuming the retry path will
return min_flt as always?


> --- mm.orig/mm/memory.c
> +++ mm/mm/memory.c
> @@ -2882,26 +2882,32 @@ int handle_mm_fault(struct mm_struct *mm
>        pud_t *pud;
>        pmd_t *pmd;
>        pte_t *pte;
> +       int ret;
>
>        __set_current_state(TASK_RUNNING);
>
> -       count_vm_event(PGFAULT);
> -
> -       if (unlikely(is_vm_hugetlb_page(vma)))
> -               return hugetlb_fault(mm, vma, address, write_access);
> +       if (unlikely(is_vm_hugetlb_page(vma))) {
> +               ret = hugetlb_fault(mm, vma, address, write_access);
> +               goto out;
> +       }
>
> +       ret = VM_FAULT_OOM;
>        pgd = pgd_offset(mm, address);
>        pud = pud_alloc(mm, pgd, address);
>        if (!pud)
> -               return VM_FAULT_OOM;
> +               goto out;
>        pmd = pmd_alloc(mm, pud, address);
>        if (!pmd)
> -               return VM_FAULT_OOM;
> +               goto out;
>        pte = pte_alloc_map(mm, pmd, address);
>        if (!pte)
> -               return VM_FAULT_OOM;
> +               goto out;
>
> -       return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
> +       ret = handle_pte_fault(mm, vma, address, pte, pmd, write_access);
> +out:
> +       if (!(ret & VM_FAULT_RETRY))
> +               count_vm_event(PGFAULT);
> +       return ret;
>  }
>
>  #ifndef __PAGETABLE_PUD_FOLDED
>
> --
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
  2009-04-07  7:17 ` [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code Wu Fengguang
@ 2009-04-07 20:03   ` Ying Han
  2009-04-07 23:27     ` Wu Fengguang
  0 siblings, 1 reply; 23+ messages in thread
From: Ying Han @ 2009-04-07 20:03 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Andrew Morton, LKML, linux-fsdevel, linux-mm

On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> Cc: Ying Han <yinghan@google.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  mm/memory.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> --- mm.orig/mm/memory.c
> +++ mm/mm/memory.c
> @@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
>  {
>        pgoff_t pgoff = (((address & PAGE_MASK)
>                        - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> -       int write = write_access & ~FAULT_FLAG_RETRY;
> -       unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
> +       unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
>
> -       flags |= (write_access & FAULT_FLAG_RETRY);
>        pte_unmap(page_table);
>        return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
>  }
So, we got rid of FAULT_FLAG_RETRY flag?
> --
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/14] mm: fix major/minor fault accounting on retried fault
  2009-04-07 19:58   ` Ying Han
@ 2009-04-07 22:45     ` Wu Fengguang
  0 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07 22:45 UTC (permalink / raw)
  To: Ying Han; +Cc: Andrew Morton, LKML, linux-fsdevel, linux-mm

On Wed, Apr 08, 2009 at 03:58:16AM +0800, Ying Han wrote:
> On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > VM_FAULT_RETRY does make major/minor faults accounting a bit twisted..
> >
> > Cc: Ying Han <yinghan@google.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  arch/x86/mm/fault.c |    2 ++
> >  mm/memory.c         |   22 ++++++++++++++--------
> >  2 files changed, 16 insertions(+), 8 deletions(-)
> >
> > --- mm.orig/arch/x86/mm/fault.c
> > +++ mm/arch/x86/mm/fault.c
> > @@ -1160,6 +1160,8 @@ good_area:
> >        if (fault & VM_FAULT_RETRY) {
> >                if (retry_flag) {
> >                        retry_flag = 0;
> > +                       tsk->maj_flt++;
> > +                       tsk->min_flt--;
> >                        goto retry;
> >                }
> >                BUG();
> sorry, little bit confuse here. are we assuming the retry path will
> return min_flt as always?

Sure - except for some really exceptional ftruncate cases.
The page was there ready, and we'll retry immediately.

maj_flt/min_flt are not _exact_ numbers by their nature, so 99.9%
accuracy shall be fine.

Thanks,
Fengguang

> > --- mm.orig/mm/memory.c
> > +++ mm/mm/memory.c
> > @@ -2882,26 +2882,32 @@ int handle_mm_fault(struct mm_struct *mm
> >        pud_t *pud;
> >        pmd_t *pmd;
> >        pte_t *pte;
> > +       int ret;
> >
> >        __set_current_state(TASK_RUNNING);
> >
> > -       count_vm_event(PGFAULT);
> > -
> > -       if (unlikely(is_vm_hugetlb_page(vma)))
> > -               return hugetlb_fault(mm, vma, address, write_access);
> > +       if (unlikely(is_vm_hugetlb_page(vma))) {
> > +               ret = hugetlb_fault(mm, vma, address, write_access);
> > +               goto out;
> > +       }
> >
> > +       ret = VM_FAULT_OOM;
> >        pgd = pgd_offset(mm, address);
> >        pud = pud_alloc(mm, pgd, address);
> >        if (!pud)
> > -               return VM_FAULT_OOM;
> > +               goto out;
> >        pmd = pmd_alloc(mm, pud, address);
> >        if (!pmd)
> > -               return VM_FAULT_OOM;
> > +               goto out;
> >        pte = pte_alloc_map(mm, pmd, address);
> >        if (!pte)
> > -               return VM_FAULT_OOM;
> > +               goto out;
> >
> > -       return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
> > +       ret = handle_pte_fault(mm, vma, address, pte, pmd, write_access);
> > +out:
> > +       if (!(ret & VM_FAULT_RETRY))
> > +               count_vm_event(PGFAULT);
> > +       return ret;
> >  }
> >
> >  #ifndef __PAGETABLE_PUD_FOLDED
> >
> > --
> >
> >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
  2009-04-07 20:03   ` Ying Han
@ 2009-04-07 23:27     ` Wu Fengguang
  2009-04-08  1:17       ` Ying Han
  0 siblings, 1 reply; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07 23:27 UTC (permalink / raw)
  To: Ying Han; +Cc: Andrew Morton, LKML, linux-fsdevel, linux-mm

On Wed, Apr 08, 2009 at 04:03:36AM +0800, Ying Han wrote:
> On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > Cc: Ying Han <yinghan@google.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  mm/memory.c |    4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > --- mm.orig/mm/memory.c
> > +++ mm/mm/memory.c
> > @@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
> >  {
> >        pgoff_t pgoff = (((address & PAGE_MASK)
> >                        - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> > -       int write = write_access & ~FAULT_FLAG_RETRY;
> > -       unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
> > +       unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
> >
> > -       flags |= (write_access & FAULT_FLAG_RETRY);
> >        pte_unmap(page_table);
> >        return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
> >  }
> So, we got rid of FAULT_FLAG_RETRY flag?

Seems yes for the current mm tree, see the following two commits.

I did this patch on seeing 761fe7bc8193b7. But a closer look
indicates that the following two patches disable the filemap
VM_FAULT_RETRY part totally...

Anyway, if these two patches are to be reverted somehow(I guess yes),
this patch shall be _ignored_.

btw, do you have any test case and performance numbers for
FAULT_FLAG_RETRY? And possible overheads for (the worst case)
sparse random mmap reads on a sparse file?  I cannot find any
in your changelogs..

Thanks,
Fengguang


commit 761fe7bc8193b7858b7dc7eb4a026dc66e49fe1f
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Mon Feb 9 21:08:50 2009 +0100

    A shot in the dark :(
    
    Cc: Mike Waychison <mikew@google.com>
    Cc: Ying Han <yinghan@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index bac7d7a..1c6736d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1139,8 +1139,6 @@ good_area:
                return;
        }
 
-       write |= retry_flag;
-
        /*
         * If for any reason at all we couldn't handle the fault,
         * make sure we exit gracefully rather than endlessly redo


commit f01ca7a68c37680a4eee22a8722a713c5102b3bb
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Mon Feb 9 21:08:50 2009 +0100

    Untangle the `write' boolean from the FAULT_FLAG_foo non-boolean field.
    
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Hugh Dickins <hugh@veritas.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
    Cc: Mike Waychison <mikew@google.com>
    Cc: Nick Piggin <npiggin@suse.de>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Rohit Seth <rohitseth@google.com>
    Cc: T<F6>r<F6>k Edwin <edwintorok@gmail.com>
    Cc: Valdis.Kletnieks@vt.edu
    Cc: Ying Han <yinghan@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b2cc88f..bac7d7a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -978,7 +978,7 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
        struct mm_struct *mm;
        int write;
        int fault;
-       unsigned int retry_flag = FAULT_FLAG_RETRY;
+       int retry_flag = 1;
 
        tsk = current;
        mm = tsk->mm;
@@ -1140,6 +1140,7 @@ good_area:
        }
 
        write |= retry_flag;
+
        /*
         * If for any reason at all we couldn't handle the fault,
         * make sure we exit gracefully rather than endlessly redo
@@ -1159,8 +1160,8 @@ good_area:
         * be removed or changed after the retry.
         */
        if (fault & VM_FAULT_RETRY) {
-               if (write & FAULT_FLAG_RETRY) {
-                       retry_flag &= ~FAULT_FLAG_RETRY;
+               if (retry_flag) {
+                       retry_flag = 0;
                        goto retry;
                }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
  2009-04-07 23:27     ` Wu Fengguang
@ 2009-04-08  1:17       ` Ying Han
  2009-04-08  2:29         ` Wu Fengguang
  0 siblings, 1 reply; 23+ messages in thread
From: Ying Han @ 2009-04-08  1:17 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Andrew Morton, LKML, linux-fsdevel, linux-mm

On Tue, Apr 7, 2009 at 4:27 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Wed, Apr 08, 2009 at 04:03:36AM +0800, Ying Han wrote:
>> On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > Cc: Ying Han <yinghan@google.com>
>> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
>> > ---
>> >  mm/memory.c |    4 +---
>> >  1 file changed, 1 insertion(+), 3 deletions(-)
>> >
>> > --- mm.orig/mm/memory.c
>> > +++ mm/mm/memory.c
>> > @@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
>> >  {
>> >        pgoff_t pgoff = (((address & PAGE_MASK)
>> >                        - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>> > -       int write = write_access & ~FAULT_FLAG_RETRY;
>> > -       unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
>> > +       unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
>> >
>> > -       flags |= (write_access & FAULT_FLAG_RETRY);
>> >        pte_unmap(page_table);
>> >        return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
>> >  }
>> So, we got rid of FAULT_FLAG_RETRY flag?
>
> Seems yes for the current mm tree, see the following two commits.
>
> I did this patch on seeing 761fe7bc8193b7. But a closer look
> indicates that the following two patches disable the filemap
> VM_FAULT_RETRY part totally...
>
> Anyway, if these two patches are to be reverted somehow(I guess yes),
> this patch shall be _ignored_.
>
> btw, do you have any test case and performance numbers for
> FAULT_FLAG_RETRY? And possible overheads for (the worst case)
> sparse random mmap reads on a sparse file?  I cannot find any
> in your changelogs..

here is the benchmark i posted on [V1] but somehow missed in [V2] describtion

Benchmarks:
case 1. one application has a high count of threads each faulting in
different pages of a hugefile. Benchmark indicate that this double data
structure walking in case of major fault results in << 1% performance hit.

case 2. add another thread in the above application which in a tight loop of
mmap()/munmap(). Here we measure loop count in the new thread while other
threads doing the same amount of work as case one. we got << 3% performance
hit on the Complete Time(benchmark value for case one) and 10% performance
improvement on the mmap()/munmap() counter.

This patch helps a lot in cases we have writer which is waitting behind all
readers, so it could execute much faster.

--Ying

>
> Thanks,
> Fengguang
>
>
> commit 761fe7bc8193b7858b7dc7eb4a026dc66e49fe1f
> Author: Andrew Morton <akpm@linux-foundation.org>
> Date:   Mon Feb 9 21:08:50 2009 +0100
>
>    A shot in the dark :(
>
>    Cc: Mike Waychison <mikew@google.com>
>    Cc: Ying Han <yinghan@google.com>
>    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index bac7d7a..1c6736d 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1139,8 +1139,6 @@ good_area:
>                return;
>        }
>
> -       write |= retry_flag;
> -
>        /*
>         * If for any reason at all we couldn't handle the fault,
>         * make sure we exit gracefully rather than endlessly redo
>
>
> commit f01ca7a68c37680a4eee22a8722a713c5102b3bb
> Author: Andrew Morton <akpm@linux-foundation.org>
> Date:   Mon Feb 9 21:08:50 2009 +0100
>
>    Untangle the `write' boolean from the FAULT_FLAG_foo non-boolean field.
>
>    Cc: "H. Peter Anvin" <hpa@zytor.com>
>    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>    Cc: David Rientjes <rientjes@google.com>
>    Cc: Hugh Dickins <hugh@veritas.com>
>    Cc: Ingo Molnar <mingo@elte.hu>
>    Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
>    Cc: Mike Waychison <mikew@google.com>
>    Cc: Nick Piggin <npiggin@suse.de>
>    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>    Cc: Rohit Seth <rohitseth@google.com>
>    Cc: T<F6>r<F6>k Edwin <edwintorok@gmail.com>
>    Cc: Valdis.Kletnieks@vt.edu
>    Cc: Ying Han <yinghan@google.com>
>    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index b2cc88f..bac7d7a 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -978,7 +978,7 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
>        struct mm_struct *mm;
>        int write;
>        int fault;
> -       unsigned int retry_flag = FAULT_FLAG_RETRY;
> +       int retry_flag = 1;
>
>        tsk = current;
>        mm = tsk->mm;
> @@ -1140,6 +1140,7 @@ good_area:
>        }
>
>        write |= retry_flag;
> +
>        /*
>         * If for any reason at all we couldn't handle the fault,
>         * make sure we exit gracefully rather than endlessly redo
> @@ -1159,8 +1160,8 @@ good_area:
>         * be removed or changed after the retry.
>         */
>        if (fault & VM_FAULT_RETRY) {
> -               if (write & FAULT_FLAG_RETRY) {
> -                       retry_flag &= ~FAULT_FLAG_RETRY;
> +               if (retry_flag) {
> +                       retry_flag = 0;
>                        goto retry;
>                }
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
  2009-04-08  1:17       ` Ying Han
@ 2009-04-08  2:29         ` Wu Fengguang
  0 siblings, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2009-04-08  2:29 UTC (permalink / raw)
  To: Ying Han; +Cc: Andrew Morton, LKML, linux-fsdevel, linux-mm

Hi Ying Han,

On Wed, Apr 08, 2009 at 09:17:26AM +0800, Ying Han wrote:
> On Tue, Apr 7, 2009 at 4:27 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Wed, Apr 08, 2009 at 04:03:36AM +0800, Ying Han wrote:
> >> On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> >> > Cc: Ying Han <yinghan@google.com>
> >> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> >> > ---
> >> >  mm/memory.c |    4 +---
> >> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >> >
> >> > --- mm.orig/mm/memory.c
> >> > +++ mm/mm/memory.c
> >> > @@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
> >> >  {
> >> >        pgoff_t pgoff = (((address & PAGE_MASK)
> >> >                        - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> >> > -       int write = write_access & ~FAULT_FLAG_RETRY;
> >> > -       unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
> >> > +       unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
> >> >
> >> > -       flags |= (write_access & FAULT_FLAG_RETRY);
> >> >        pte_unmap(page_table);
> >> >        return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
> >> >  }
> >> So, we got rid of FAULT_FLAG_RETRY flag?
> >
> > Seems yes for the current mm tree, see the following two commits.
> >
> > I did this patch on seeing 761fe7bc8193b7. But a closer look
> > indicates that the following two patches disable the filemap
> > VM_FAULT_RETRY part totally...
> >
> > Anyway, if these two patches are to be reverted somehow(I guess yes),
> > this patch shall be _ignored_.
> >
> > btw, do you have any test case and performance numbers for
> > FAULT_FLAG_RETRY? And possible overheads for (the worst case)
> > sparse random mmap reads on a sparse file?  I cannot find any
> > in your changelogs..
> 
> here is the benchmark i posted on [V1] but somehow missed in [V2] describtion
> 
> Benchmarks:
> case 1. one application has a high count of threads each faulting in
> different pages of a hugefile. Benchmark indicate that this double data
> structure walking in case of major fault results in << 1% performance hit.
> 
> case 2. add another thread in the above application which in a tight loop of
> mmap()/munmap(). Here we measure loop count in the new thread while other
> threads doing the same amount of work as case one. we got << 3% performance
> hit on the Complete Time(benchmark value for case one) and 10% performance
> improvement on the mmap()/munmap() counter.
> 
> This patch helps a lot in cases we have writer which is waitting behind all
> readers, so it could execute much faster.
> 

Just tested the sparse-random-read-on-sparse-file case, and found the
performance impact to be 0.4% (8.706s vs 8.744s). Kind of acceptable.

without FAULT_FLAG_RETRY:
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.28s user 5.39s system 99% cpu 8.692 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.17s user 5.54s system 99% cpu 8.742 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.18s user 5.48s system 99% cpu 8.684 total

FAULT_FLAG_RETRY:
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.18s user 5.63s system 99% cpu 8.825 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.22s user 5.47s system 99% cpu 8.718 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.13s user 5.55s system 99% cpu 8.690 total

In the above faked workload, the mmap read page offsets are loaded from
stride-100 and performed on /mnt/btrfs-ram/sparse, which are created by:

                seq 0 100 1000000 > stride-100
                dd if=/dev/zero of=/mnt/btrfs-ram/sparse bs=1M count=1 seek=1024000

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead
  2009-04-07 11:50 ` [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
@ 2009-04-07 15:50   ` Linus Torvalds
  0 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2009-04-07 15:50 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Benjamin Herrenschmidt, Pavel Levshin, wli,
	Nick Piggin, David Rientjes, Hugh Dickins, Ingo Molnar,
	Lee Schermerhorn, Mike Waychison, Peter Zijlstra, Rohit Seth,
	Edwin, H. Peter Anvin, Ying Han, LKML, linux-mm, linux-fsdevel



On Tue, 7 Apr 2009, Wu Fengguang wrote:
>
> From: Linus Torvalds <torvalds@linux-foundation.org>
> 
> This shouldn't really change behavior all that much, but the single
> rather complex function with read-ahead inside a loop etc is broken up
> into more manageable pieces.

Heh. That's an old patch.

Anyway, ACK on the whole series (or at least the pieces of it that were 
cc'd to me). Looks like sane cleanups, and I don't mean just my own old 
patch ;)

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead
  2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
  2009-04-07 15:50   ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Benjamin Herrenschmidt, Pavel Levshin, wli, Nick Piggin,
	Wu Fengguang, Linus Torvalds, David Rientjes, Hugh Dickins,
	Ingo Molnar, Lee Schermerhorn, Mike Waychison, Peter Zijlstra,
	Rohit Seth, Edwin, H. Peter Anvin, Ying Han, LKML, linux-mm,
	linux-fsdevel

[-- Attachment #1: readahead-mmap-split-code-and-cleanup.patch --]
[-- Type: text/plain, Size: 9841 bytes --]

From: Linus Torvalds <torvalds@linux-foundation.org>

This shouldn't really change behavior all that much, but the single
rather complex function with read-ahead inside a loop etc is broken up
into more manageable pieces.

The behaviour is also less subtle, with the read-ahead being done up-front 
rather than inside some subtle loop and thus avoiding the now unnecessary 
extra state variables (ie "did_readaround" is gone).

Fengguang: the code split in fact fixed a bug reported by Pavel Levshin:
the PGMAJFAULT accounting used to be bypassed when MADV_RANDOM is set, in
which case the original code will directly jump to no_cached_page reading.

Cc: Pavel Levshin <lpk@581.spb.su>
Cc: wli@movementarian.org
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Ok, so this is something I did in Mexico when I wasn't scuba-diving, and 
was "watching" the kids at the pool. It was brought on by looking at git 
mmap file behaviour under cold-cache behaviour: git does ok, but my laptop 
disk is really slow, and I tried to verify that the kernel did a 
reasonable job of read-ahead when taking page faults.

I think it did, but quite frankly, the filemap_fault() code was totally 
unreadable. So this separates out the read-ahead cases, and adds more 
comments, and also changes it so that we do asynchronous read-ahead 
*before* we actually wait for the page we are waiting for to become 
unlocked.

Not that it seems to make any real difference on my laptop, but I really 
hated how it was doing a

	page = get_lock_page(..)

and then doing read-ahead after that: which just guarantees that we have 
to wait for any out-standing IO on "page" to complete before we can even 
submit any new read-ahead! That just seems totally broken!

So it replaces the "get_lock_page()" at the top with a broken-out page 
cache lookup, which allows us to look at the page state flags and make 
appropriate decisions on what we should do without waiting for the locked 
bit to clear.

It does add many more lines than it removes:

	 mm/filemap.c |  192 +++++++++++++++++++++++++++++++++++++++-------------------
	 1 files changed, 130 insertions(+), 62 deletions(-)

but that's largely due to (a) the new function headers etc due to the 
split-up and (b) new or extended comments especially about the helper 
functions. The code, in many ways, is actually simpler, apart from the 
fairly trivial expansion of the equivalent of "get_lock_page()" into the 
function.

Comments? I tried to avoid changing the read-ahead logic itself, although 
the old code did some strange things like doing *both* async readahead and 
then looking up the page and doing sync readahead (which I think was just 
due to the code being so damn messily organized, not on purpose).

			Linus

---
 mm/filemap.c |  164 ++++++++++++++++++++++++++-----------------------
 1 file changed, 90 insertions(+), 74 deletions(-)

--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1524,6 +1524,68 @@ static int page_cache_read(struct file *
 
 #define MMAP_LOTSAMISS  (100)
 
+/*
+ * Synchronous readahead happens when we don't even find
+ * a page in the page cache at all.
+ */
+static void do_sync_mmap_readahead(struct vm_area_struct *vma,
+				   struct file_ra_state *ra,
+				   struct file *file,
+				   pgoff_t offset)
+{
+	unsigned long ra_pages;
+	struct address_space *mapping = file->f_mapping;
+
+	/* If we don't want any read-ahead, don't bother */
+	if (VM_RandomReadHint(vma))
+		return;
+
+	if (VM_SequentialReadHint(vma)) {
+		page_cache_sync_readahead(mapping, ra, file, offset, 1);
+		return;
+	}
+
+	if (ra->mmap_miss < INT_MAX)
+		ra->mmap_miss++;
+
+	/*
+	 * Do we miss much more than hit in this file? If so,
+	 * stop bothering with read-ahead. It will only hurt.
+	 */
+	if (ra->mmap_miss > MMAP_LOTSAMISS)
+		return;
+
+	ra_pages = max_sane_readahead(ra->ra_pages);
+	if (ra_pages) {
+		pgoff_t start = 0;
+
+		if (offset > ra_pages / 2)
+			start = offset - ra_pages / 2;
+		do_page_cache_readahead(mapping, file, start, ra_pages);
+	}
+}
+
+/*
+ * Asynchronous readahead happens when we find the page and PG_readahead,
+ * so we want to possibly extend the readahead further..
+ */
+static void do_async_mmap_readahead(struct vm_area_struct *vma,
+				    struct file_ra_state *ra,
+				    struct file *file,
+				    struct page *page,
+				    pgoff_t offset)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	/* If we don't want any read-ahead, don't bother */
+	if (VM_RandomReadHint(vma))
+		return;
+	if (ra->mmap_miss > 0)
+		ra->mmap_miss--;
+	if (PageReadahead(page))
+		page_cache_async_readahead(mapping, ra, file, page, offset, 1);
+}
+
 /**
  * filemap_fault - read in file data for page fault handling
  * @vma:	vma in which the fault was taken
@@ -1543,80 +1605,43 @@ int filemap_fault(struct vm_area_struct 
 	struct address_space *mapping = file->f_mapping;
 	struct file_ra_state *ra = &file->f_ra;
 	struct inode *inode = mapping->host;
+	pgoff_t offset = vmf->pgoff;
 	struct page *page;
 	pgoff_t size;
-	int did_readaround = 0;
 	int ret = 0;
 	int retry_flag = vmf->flags & FAULT_FLAG_RETRY;
 	int retry_ret;
 
 	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	if (vmf->pgoff >= size)
+	if (offset >= size)
 		return VM_FAULT_SIGBUS;
 
-	/* If we don't want any read-ahead, don't bother */
-	if (VM_RandomReadHint(vma))
-		goto no_cached_page;
-
 	/*
 	 * Do we have something in the page cache already?
 	 */
-retry_find:
-	page = find_lock_page(mapping, vmf->pgoff);
-
-	/*
-	 * For sequential accesses, we use the generic readahead logic.
-	 */
-	if (VM_SequentialReadHint(vma)) {
-		if (!page) {
-			page_cache_sync_readahead(mapping, ra, file,
-							   vmf->pgoff, 1);
-			retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
-						vma, &page, retry_flag);
-			if (retry_ret == VM_FAULT_RETRY) {
-				ra->mmap_miss++; /* counteract the followed retry hit */
-				return retry_ret;
-			}
-			if (!page)
-				goto no_cached_page;
-		}
-		if (PageReadahead(page)) {
-			page_cache_async_readahead(mapping, ra, file, page,
-							   vmf->pgoff, 1);
-		}
-	}
-
-	if (!page) {
-		unsigned long ra_pages;
-
-		ra->mmap_miss++;
+	page = find_get_page(mapping, offset);
 
+	if (likely(page)) {
 		/*
-		 * Do we miss much more than hit in this file? If so,
-		 * stop bothering with read-ahead. It will only hurt.
+		 * We found the page, so try async readahead before
+		 * waiting for the lock.
 		 */
-		if (ra->mmap_miss > MMAP_LOTSAMISS)
-			goto no_cached_page;
+		do_async_mmap_readahead(vma, ra, file, page, offset);
+		lock_page(page);
 
-		/*
-		 * To keep the pgmajfault counter straight, we need to
-		 * check did_readaround, as this is an inner loop.
-		 */
-		if (!did_readaround) {
-			ret = VM_FAULT_MAJOR;
-			count_vm_event(PGMAJFAULT);
-		}
-		did_readaround = 1;
-		ra_pages = max_sane_readahead(file->f_ra.ra_pages);
-		if (ra_pages) {
-			pgoff_t start = 0;
-
-			if (vmf->pgoff > ra_pages / 2)
-				start = vmf->pgoff - ra_pages / 2;
-			do_page_cache_readahead(mapping, file, start, ra_pages);
+		/* Did it get truncated? */
+		if (unlikely(page->mapping != mapping)) {
+			unlock_page(page);
+			put_page(page);
+			goto no_cached_page;
 		}
-retry_find_retry:
-		retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
+	} else {
+		/* No page in the page cache at all */
+		do_sync_mmap_readahead(vma, ra, file, offset);
+		count_vm_event(PGMAJFAULT);
+		ret = VM_FAULT_MAJOR;
+retry_find:
+		retry_ret = find_lock_page_retry(mapping, offset,
 				vma, &page, retry_flag);
 		if (retry_ret == VM_FAULT_RETRY) {
 			ra->mmap_miss++; /* counteract the followed retry hit */
@@ -1626,9 +1651,6 @@ retry_find_retry:
 			goto no_cached_page;
 	}
 
-	if (!did_readaround)
-		ra->mmap_miss--;
-
 	/*
 	 * We have a locked page in the page cache, now we need to check
 	 * that it's up-to-date. If not, it is going to be due to an error.
@@ -1636,19 +1658,19 @@ retry_find_retry:
 	if (unlikely(!PageUptodate(page)))
 		goto page_not_uptodate;
 
-	/* Must recheck i_size under page lock */
+	/*
+	 * Found the page and have a reference on it.
+	 * We must recheck i_size under page lock.
+	 */
 	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	if (unlikely(vmf->pgoff >= size)) {
+	if (unlikely(offset >= size)) {
 		unlock_page(page);
 		page_cache_release(page);
 		return VM_FAULT_SIGBUS;
 	}
 
-	/*
-	 * Found the page and have a reference on it.
-	 */
 	update_page_reclaim_stat(page);
-	ra->prev_pos = (loff_t)page->index << PAGE_CACHE_SHIFT;
+	ra->prev_pos = (loff_t)offset << PAGE_CACHE_SHIFT;
 	vmf->page = page;
 	return ret | VM_FAULT_LOCKED;
 
@@ -1657,7 +1679,7 @@ no_cached_page:
 	 * We're only likely to ever get here if MADV_RANDOM is in
 	 * effect.
 	 */
-	error = page_cache_read(file, vmf->pgoff);
+	error = page_cache_read(file, offset);
 
 	/*
 	 * The page we want has now been added to the page cache.
@@ -1665,7 +1687,7 @@ no_cached_page:
 	 * meantime, we'll just come back here and read it again.
 	 */
 	if (error >= 0)
-		goto retry_find_retry;
+		goto retry_find;
 
 	/*
 	 * An error return from page_cache_read can result if the
@@ -1677,12 +1699,6 @@ no_cached_page:
 	return VM_FAULT_SIGBUS;
 
 page_not_uptodate:
-	/* IO error path */
-	if (!did_readaround) {
-		ret = VM_FAULT_MAJOR;
-		count_vm_event(PGMAJFAULT);
-	}
-
 	/*
 	 * Umm, take care of errors if the page isn't up-to-date.
 	 * Try to re-read it _once_. We do this synchronously,

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-04-08  2:29 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-07  7:17 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
2009-04-07  7:17 ` [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing Wu Fengguang
2009-04-07  7:17 ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Wu Fengguang
2009-04-07 19:58   ` Ying Han
2009-04-07 22:45     ` Wu Fengguang
2009-04-07  7:17 ` [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code Wu Fengguang
2009-04-07 20:03   ` Ying Han
2009-04-07 23:27     ` Wu Fengguang
2009-04-08  1:17       ` Ying Han
2009-04-08  2:29         ` Wu Fengguang
2009-04-07  7:17 ` [PATCH 04/14] mm: reduce duplicate page fault code Wu Fengguang
2009-04-07  7:17 ` [PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY Wu Fengguang
2009-04-07  7:17 ` [PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead() Wu Fengguang
2009-04-07  7:17 ` [PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead() Wu Fengguang
2009-04-07  7:17 ` [PATCH 08/14] readahead: remove one unnecessary radix tree lookup Wu Fengguang
2009-04-07  7:17 ` [PATCH 09/14] readahead: increase interleaved readahead size Wu Fengguang
2009-04-07  7:17 ` [PATCH 10/14] readahead: remove sync/async readahead call dependency Wu Fengguang
2009-04-07  7:17 ` [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
2009-04-07  7:17 ` [PATCH 12/14] readahead: sequential mmap readahead Wu Fengguang
2009-04-07  7:17 ` [PATCH 13/14] readahead: enforce full readahead size on async " Wu Fengguang
2009-04-07  7:17 ` [PATCH 14/14] readahead: record mmap read-around states in file_ra_state Wu Fengguang
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
2009-04-07 11:50 ` [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
2009-04-07 15:50   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).