linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] mmap read-around and readahead
       [not found] <20071216115927.986126305@mail.ustc.edu.cn>
@ 2007-12-16 11:59 ` Fengguang Wu
  2007-12-16 23:35   ` Linus Torvalds
       [not found] ` <20071216120417.586021813@mail.ustc.edu.cn>
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

Andrew,

Here are the mmap read-around related patches initiated by Linus.
They are for linux-2.6.24-rc4-mm1.  The one major new feature -
auto detection and early readahead for mmap sequential reads - runs
as expected on my desktop :-)


[PATCH 1/9] readahead: simplify readahead call scheme
[PATCH 2/9] readahead: clean up and simplify the code for filemap page fault readahead
[PATCH 3/9] readahead: auto detection of sequential mmap reads
[PATCH 4/9] readahead: quick startup on sequential mmap readahead
[PATCH 5/9] readahead: make ra_submit() non-static
[PATCH 6/9] readahead: save mmap read-around states in file_ra_state
[PATCH 7/9] readahead: remove unused do_page_cache_readahead()
[PATCH 8/9] readahead: move max_sane_readahead() calls into force_page_cache_readahead()
[PATCH 9/9] readahead: call max_sane_readahead() in ondemand_readahead()

Thank you,
Fengguang
-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/9] readahead: simplify readahead call scheme
       [not found] ` <20071216120417.586021813@mail.ustc.edu.cn>
@ 2007-12-16 11:59   ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: readahead-merge-sync-async.patch --]
[-- Type: text/plain, Size: 1668 bytes --]

It is insane and error-prone to insist on the call sites to check for async
readahead after doing any sync one. I.e. whenever someone do a sync readahead:

				if (!page)
					page_cache_sync_readahead(...);

He must try async readahead, too:

				page = find_get_page(...);
				if (PageReadahead(page))
					page_cache_async_readahead(...);

The tricky point is that PG_readahead could be set by a sync readahead for the
_current_ newly faulted in page, and the readahead code simply expects one more
callback to handle it. If the caller fails to do so, it will miss the
PG_readahead bits and never able to start an async readahead.

Avoid it by piggy-backing the async part _inside_ the readahead code.

Now if an async readahead should be started immediately after a sync one,
the readahead logic itself will do it. So the following code becomes valid:

				if (!page)
					page_cache_sync_readahead(...);
				else if (PageReadahead(page))
					page_cache_async_readahead(...);

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 mm/readahead.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- linux-2.6.24-rc4-mm1.orig/mm/readahead.c
+++ linux-2.6.24-rc4-mm1/mm/readahead.c
@@ -402,6 +402,14 @@ ondemand_readahead(struct address_space 
 	ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
 
 readit:
+	/*
+	 * An async readahead should be triggered immediately.
+	 * Instead of demanding all call sites to check for async readahead
+	 * immediate after a sync one, start the async part now and here.
+	 */
+	if (!hit_readahead_marker && ra->size == ra->async_size)
+		ra->size *= 2;
+
 	return ra_submit(ra, mapping, filp);
 }
 

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/9] readahead: clean up and simplify the code for filemap page fault readahead
       [not found] ` <20071216120417.748714367@mail.ustc.edu.cn>
@ 2007-12-16 11:59   ` Fengguang Wu
  2007-12-18  8:19   ` Nick Piggin
  1 sibling, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, Nick Piggin, Andrew Morton, linux-kernel

[-- Attachment #1: readahead-standalone-mmap-readaround.patch --]
[-- Type: text/plain, Size: 9522 bytes --]

From: Linus Torvalds <torvalds@linux-foundation.org>

This shouldn't really change behavior all that much, but the single
rather complex function with read-ahead inside a loop etc is broken up
into more manageable pieces.

The behaviour is also less subtle, with the read-ahead being done up-front 
rather than inside some subtle loop and thus avoiding the now unnecessary 
extra state variables (ie "did_readaround" is gone).

Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Ok, so this is something I did in Mexico when I wasn't scuba-diving, and 
was "watching" the kids at the pool. It was brought on by looking at git 
mmap file behaviour under cold-cache behaviour: git does ok, but my laptop 
disk is really slow, and I tried to verify that the kernel did a 
reasonable job of read-ahead when taking page faults.

I think it did, but quite frankly, the filemap_fault() code was totally 
unreadable. So this separates out the read-ahead cases, and adds more 
comments, and also changes it so that we do asynchronous read-ahead 
*before* we actually wait for the page we are waiting for to become 
unlocked.

Not that it seems to make any real difference on my laptop, but I really 
hated how it was doing a

	page = get_lock_page(..)

and then doing read-ahead after that: which just guarantees that we have 
to wait for any out-standing IO on "page" to complete before we can even 
submit any new read-ahead! That just seems totally broken!

So it replaces the "get_lock_page()" at the top with a broken-out page 
cache lookup, which allows us to look at the page state flags and make 
appropriate decisions on what we should do without waiting for the locked 
bit to clear.

It does add many more lines than it removes:

	 mm/filemap.c |  192 +++++++++++++++++++++++++++++++++++++++-------------------
	 1 files changed, 130 insertions(+), 62 deletions(-)

but that's largely due to (a) the new function headers etc due to the 
split-up and (b) new or extended comments especially about the helper 
functions. The code, in many ways, is actually simpler, apart from the 
fairly trivial expansion of the equivalent of "get_lock_page()" into the 
function.

Comments? I tried to avoid changing the read-ahead logic itself, although 
the old code did some strange things like doing *both* async readahead and 
then looking up the page and doing sync readahead (which I think was just 
due to the code being so damn messily organized, not on purpose).

			Linus

---
 mm/filemap.c |  190 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 128 insertions(+), 62 deletions(-)

--- linux-2.6.24-rc4-mm1.orig/mm/filemap.c
+++ linux-2.6.24-rc4-mm1/mm/filemap.c
@@ -1302,6 +1302,86 @@ static int fastcall page_cache_read(stru
 
 #define MMAP_LOTSAMISS  (100)
 
+/*
+ * Synchronous readahead happens when we don't even find
+ * a page in the page cache at all.
+ */
+static void do_sync_mmap_readahead(struct vm_area_struct *vma,
+				   struct file_ra_state *ra,
+				   struct file *file,
+				   pgoff_t offset)
+{
+	unsigned long ra_pages;
+	struct address_space *mapping = file->f_mapping;
+
+	/* If we don't want any read-ahead, don't bother */
+	if (VM_RandomReadHint(vma))
+		return;
+
+	if (VM_SequentialReadHint(vma)) {
+		page_cache_sync_readahead(mapping, ra, file, offset, 1);
+		return;
+	}
+
+	ra->mmap_miss++;
+
+	/*
+	 * Do we miss much more than hit in this file? If so,
+	 * stop bothering with read-ahead. It will only hurt.
+	 */
+	if (ra->mmap_miss > MMAP_LOTSAMISS)
+		return;
+
+	ra_pages = max_sane_readahead(file->f_ra.ra_pages);
+	if (ra_pages) {
+		pgoff_t start = 0;
+
+		if (offset > ra_pages / 2)
+			start = offset - ra_pages / 2;
+		do_page_cache_readahead(mapping, file, start, ra_pages);
+	}
+}
+
+/*
+ * Asynchronous readahead happens when we find the page,
+ * but it is busy being read, so we want to possibly
+ * extend the readahead further..
+ */
+static void do_async_mmap_readahead(struct vm_area_struct *vma,
+				    struct file_ra_state *ra,
+				    struct file *file,
+				    struct page *page,
+				    pgoff_t offset)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	/* If we don't want any read-ahead, don't bother */
+	if (VM_RandomReadHint(vma))
+		return;
+	if (ra->mmap_miss > 0)
+		ra->mmap_miss--;
+	if (PageReadahead(page))
+		page_cache_async_readahead(mapping, ra, file, page, offset, 1);
+}
+
+/*
+ * A successful mmap hit is when we didn't need any IO at all,
+ * and got an immediate lock on an up-to-date page. There's not
+ * much to do, except decide on whether we want to trigger read-
+ * ahead.
+ *
+ * We currently do the same thing as we did for a locked page
+ * that we're waiting for.
+ */
+static void do_mmap_hit(struct vm_area_struct *vma,
+			struct file_ra_state *ra,
+			struct file *file,
+			struct page *page,
+			pgoff_t offset)
+{
+	do_async_mmap_readahead(vma, ra, file, page, offset);
+}
+
 /**
  * filemap_fault - read in file data for page fault handling
  * @vma:	vma in which the fault was taken
@@ -1321,78 +1401,69 @@ int filemap_fault(struct vm_area_struct 
 	struct address_space *mapping = file->f_mapping;
 	struct file_ra_state *ra = &file->f_ra;
 	struct inode *inode = mapping->host;
+	pgoff_t offset = vmf->pgoff;
 	struct page *page;
 	unsigned long size;
-	int did_readaround = 0;
 	int ret = 0;
 
 	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 
-	/* If we don't want any read-ahead, don't bother */
-	if (VM_RandomReadHint(vma))
-		goto no_cached_page;
-
 	/*
-	 * Do we have something in the page cache already?
+	 * Do we have something in the page cache already that
+	 * is unlocked and already up-to-date?
 	 */
-retry_find:
-	page = find_lock_page(mapping, vmf->pgoff);
-	/*
-	 * For sequential accesses, we use the generic readahead logic.
-	 */
-	if (VM_SequentialReadHint(vma)) {
-		if (!page) {
-			page_cache_sync_readahead(mapping, ra, file,
-							   vmf->pgoff, 1);
-			page = find_lock_page(mapping, vmf->pgoff);
-			if (!page)
-				goto no_cached_page;
-		}
-		if (PageReadahead(page)) {
-			page_cache_async_readahead(mapping, ra, file, page,
-							   vmf->pgoff, 1);
-		}
-	}
+	read_lock_irq(&mapping->tree_lock);
+	page = radix_tree_lookup(&mapping->page_tree, offset);
+	if (likely(page)) {
+		int got_lock, uptodate;
+		page_cache_get(page);
+
+		got_lock = !TestSetPageLocked(page);
+		uptodate = PageUptodate(page);
+		read_unlock_irq(&mapping->tree_lock);
 
-	if (!page) {
-		unsigned long ra_pages;
+		if (likely(got_lock)) {
+			/*
+			 * Previous IO error? No read-ahead, but try to
+			 * re-do a single read.
+			 */
+			if (unlikely(!uptodate))
+				goto page_not_uptodate;
 
-		ra->mmap_miss++;
+			do_mmap_hit(vma, ra, file, page, offset);
+			goto found_it;
+		}
 
 		/*
-		 * Do we miss much more than hit in this file? If so,
-		 * stop bothering with read-ahead. It will only hurt.
+		 * We found the page, but it was locked..
+		 *
+		 * So do async readahead and wait for it to
+		 * unlock.
 		 */
-		if (ra->mmap_miss > MMAP_LOTSAMISS)
-			goto no_cached_page;
+		do_async_mmap_readahead(vma, ra, file, page, offset);
+		lock_page(page);
 
-		/*
-		 * To keep the pgmajfault counter straight, we need to
-		 * check did_readaround, as this is an inner loop.
-		 */
-		if (!did_readaround) {
-			ret = VM_FAULT_MAJOR;
-			count_vm_event(PGMAJFAULT);
-		}
-		did_readaround = 1;
-		ra_pages = max_sane_readahead(file->f_ra.ra_pages);
-		if (ra_pages) {
-			pgoff_t start = 0;
-
-			if (vmf->pgoff > ra_pages / 2)
-				start = vmf->pgoff - ra_pages / 2;
-			do_page_cache_readahead(mapping, file, start, ra_pages);
+		/* Did it get truncated? */
+		if (unlikely(page->mapping != mapping)) {
+			unlock_page(page);
+			put_page(page);
+			goto no_cached_page;
 		}
-		page = find_lock_page(mapping, vmf->pgoff);
+	} else {
+		read_unlock_irq(&mapping->tree_lock);
+
+		/* No page in the page cache at all */
+		do_sync_mmap_readahead(vma, ra, file, offset);
+		ret = VM_FAULT_MAJOR;
+		count_vm_event(PGMAJFAULT);
+retry_find:
+		page = find_lock_page(mapping, offset);
 		if (!page)
 			goto no_cached_page;
 	}
 
-	if (!did_readaround)
-		ra->mmap_miss--;
-
 	/*
 	 * We have a locked page in the page cache, now we need to check
 	 * that it's up-to-date. If not, it is going to be due to an error.
@@ -1400,7 +1471,11 @@ retry_find:
 	if (unlikely(!PageUptodate(page)))
 		goto page_not_uptodate;
 
-	/* Must recheck i_size under page lock */
+	/*
+	 * Found the page and have a reference on it.
+	 * We must recheck i_size under page lock
+	 */
+found_it:
 	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (unlikely(vmf->pgoff >= size)) {
 		unlock_page(page);
@@ -1408,11 +1483,8 @@ retry_find:
 		return VM_FAULT_SIGBUS;
 	}
 
-	/*
-	 * Found the page and have a reference on it.
-	 */
 	mark_page_accessed(page);
-	ra->prev_pos = (loff_t)page->index << PAGE_CACHE_SHIFT;
+	ra->prev_pos = (loff_t)offset << PAGE_CACHE_SHIFT;
 	vmf->page = page;
 	return ret | VM_FAULT_LOCKED;
 
@@ -1441,12 +1513,6 @@ no_cached_page:
 	return VM_FAULT_SIGBUS;
 
 page_not_uptodate:
-	/* IO error path */
-	if (!did_readaround) {
-		ret = VM_FAULT_MAJOR;
-		count_vm_event(PGMAJFAULT);
-	}
-
 	/*
 	 * Umm, take care of errors if the page isn't up-to-date.
 	 * Try to re-read it _once_. We do this synchronously,

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/9] readahead: auto detection of sequential mmap reads
       [not found] ` <20071216120417.905514988@mail.ustc.edu.cn>
@ 2007-12-16 11:59   ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: readahead-auto-detect-mmap-sequential-reads.patch --]
[-- Type: text/plain, Size: 1295 bytes --]

Auto-detect sequential mmap reads and do sync/async readahead for them.

The sequential mmap readahead will be triggered when
- sync readahead: it's a major fault and (prev_offset==offset-1);
- async readahead: minor fault on PG_readahead page with valid readahead state.

It's a bit conservative to require valid readahead state for async readahead,
which means we don't do readahead for interleaved reads for now, but let's make
it safe for this initial try.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---

---
 mm/filemap.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- linux-2.6.24-rc4-mm1.orig/mm/filemap.c
+++ linux-2.6.24-rc4-mm1/mm/filemap.c
@@ -1318,7 +1318,8 @@ static void do_sync_mmap_readahead(struc
 	if (VM_RandomReadHint(vma))
 		return;
 
-	if (VM_SequentialReadHint(vma)) {
+	if (VM_SequentialReadHint(vma) ||
+			offset - 1 == (ra->prev_pos >> PAGE_CACHE_SHIFT)) {
 		page_cache_sync_readahead(mapping, ra, file, offset, 1);
 		return;
 	}
@@ -1360,7 +1361,8 @@ static void do_async_mmap_readahead(stru
 		return;
 	if (ra->mmap_miss > 0)
 		ra->mmap_miss--;
-	if (PageReadahead(page))
+	if (PageReadahead(page) &&
+			offset == ra->start + ra->size - ra->async_size)
 		page_cache_async_readahead(mapping, ra, file, page, offset, 1);
 }
 

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/9] readahead: quick startup on sequential mmap readahead
       [not found] ` <20071216120418.055796608@mail.ustc.edu.cn>
@ 2007-12-16 11:59   ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: readahead-sequential-quick-start.patch --]
[-- Type: text/plain, Size: 851 bytes --]

When the user explicitly sets MADV_SEQUENTIAL, we should really avoid the slow
readahead size ramp-up phase and start full-size readahead immediately.

This patch won't change behavior for the auto-detected sequential mmap reads.
Its previous read-around size is ra_pages/2, so it will be doubled to the full
readahead size anyway.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 mm/filemap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.24-rc4-mm1.orig/mm/filemap.c
+++ linux-2.6.24-rc4-mm1/mm/filemap.c
@@ -1320,7 +1320,7 @@ static void do_sync_mmap_readahead(struc
 
 	if (VM_SequentialReadHint(vma) ||
 			offset - 1 == (ra->prev_pos >> PAGE_CACHE_SHIFT)) {
-		page_cache_sync_readahead(mapping, ra, file, offset, 1);
+		page_cache_sync_readahead(mapping, ra, file, offset, ra->ra_pages);
 		return;
 	}
 

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/9] readahead: make ra_submit() non-static
       [not found] ` <20071216120418.201213716@mail.ustc.edu.cn>
@ 2007-12-16 11:59   ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: readahead-export-ra_submit.patch --]
[-- Type: text/plain, Size: 1086 bytes --]

Make ra_submit() non-static and callable from other files.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
---
 include/linux/mm.h |    3 +++
 mm/readahead.c     |    2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

--- linux-2.6.24-rc4-mm1.orig/include/linux/mm.h
+++ linux-2.6.24-rc4-mm1/include/linux/mm.h
@@ -1103,6 +1103,9 @@ void page_cache_async_readahead(struct a
 				unsigned long size);
 
 unsigned long max_sane_readahead(unsigned long nr);
+unsigned long ra_submit(struct file_ra_state *ra,
+		        struct address_space *mapping,
+			struct file *filp);
 
 /* Do stack extension */
 extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
--- linux-2.6.24-rc4-mm1.orig/mm/readahead.c
+++ linux-2.6.24-rc4-mm1/mm/readahead.c
@@ -242,7 +242,7 @@ subsys_initcall(readahead_init);
 /*
  * Submit IO for the read-ahead request in file_ra_state.
  */
-static unsigned long ra_submit(struct file_ra_state *ra,
+unsigned long ra_submit(struct file_ra_state *ra,
 		       struct address_space *mapping, struct file *filp)
 {
 	int actual;

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/9] readahead: save mmap read-around states in file_ra_state
       [not found] ` <20071216120418.360445517@mail.ustc.edu.cn>
@ 2007-12-16 11:59   ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: readahead-convert-mmap-readaround.patch --]
[-- Type: text/plain, Size: 881 bytes --]

Change mmap read-around to share the same code style and data structure
with readahead code.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 mm/filemap.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

--- linux-2.6.24-rc4-mm1.orig/mm/filemap.c
+++ linux-2.6.24-rc4-mm1/mm/filemap.c
@@ -1333,13 +1333,15 @@ static void do_sync_mmap_readahead(struc
 	if (ra->mmap_miss > MMAP_LOTSAMISS)
 		return;
 
-	ra_pages = max_sane_readahead(file->f_ra.ra_pages);
+	/*
+	 * mmap read-around
+	 */
+	ra_pages = max_sane_readahead(ra->ra_pages);
 	if (ra_pages) {
-		pgoff_t start = 0;
-
-		if (offset > ra_pages / 2)
-			start = offset - ra_pages / 2;
-		do_page_cache_readahead(mapping, file, start, ra_pages);
+		ra->start = max_t(long, 0, offset - ra_pages / 2);
+		ra->size = ra_pages;
+		ra->async_size = 0;
+		ra_submit(ra, mapping, file);
 	}
 }
 

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 7/9] readahead: remove unused do_page_cache_readahead()
       [not found] ` <20071216120418.498110756@mail.ustc.edu.cn>
@ 2007-12-16 11:59   ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: readahead-remove-do_page_cache_readahead.patch --]
[-- Type: text/plain, Size: 1799 bytes --]

Remove do_page_cache_readahead().
Its last user, mmap read-around, has been changed to call ra_submit().

Also, the no-readahead-if-congested logic is not appropriate here. 
Raw 1-page reads can only makes things painfully slower, and
users are pretty sensitive about the slow loading of executables.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 include/linux/mm.h |    2 --
 mm/readahead.c     |   16 ----------------
 2 files changed, 18 deletions(-)

--- linux-2.6.24-rc4-mm1.orig/include/linux/mm.h
+++ linux-2.6.24-rc4-mm1/include/linux/mm.h
@@ -1084,8 +1084,6 @@ int write_one_page(struct page *page, in
 #define VM_MAX_READAHEAD	128	/* kbytes */
 #define VM_MIN_READAHEAD	16	/* kbytes (includes current page) */
 
-int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
-			pgoff_t offset, unsigned long nr_to_read);
 int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
 			pgoff_t offset, unsigned long nr_to_read);
 
--- linux-2.6.24-rc4-mm1.orig/mm/readahead.c
+++ linux-2.6.24-rc4-mm1/mm/readahead.c
@@ -208,22 +208,6 @@ int force_page_cache_readahead(struct ad
 }
 
 /*
- * This version skips the IO if the queue is read-congested, and will tell the
- * block layer to abandon the readahead if request allocation would block.
- *
- * force_page_cache_readahead() will ignore queue congestion and will block on
- * request queues.
- */
-int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
-			pgoff_t offset, unsigned long nr_to_read)
-{
-	if (bdi_read_congested(mapping->backing_dev_info))
-		return -1;
-
-	return __do_page_cache_readahead(mapping, filp, offset, nr_to_read, 0);
-}
-
-/*
  * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
  * sensible upper limit.
  */

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 8/9] readahead: move max_sane_readahead() calls into force_page_cache_readahead()
       [not found] ` <20071216120418.639781633@mail.ustc.edu.cn>
@ 2007-12-16 11:59   ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: readahead-check-max_sane_readahead.patch --]
[-- Type: text/plain, Size: 1729 bytes --]

Simplify code by moving max_sane_readahead() calls into
force_page_cache_readahead().

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 mm/fadvise.c   |    2 +-
 mm/filemap.c   |    3 +--
 mm/madvise.c   |    3 +--
 mm/readahead.c |    1 +
 4 files changed, 4 insertions(+), 5 deletions(-)

--- linux-2.6.24-rc4-mm1.orig/mm/fadvise.c
+++ linux-2.6.24-rc4-mm1/mm/fadvise.c
@@ -89,7 +89,7 @@ asmlinkage long sys_fadvise64_64(int fd,
 		
 		ret = force_page_cache_readahead(mapping, file,
 				start_index,
-				max_sane_readahead(nrpages));
+				nrpages);
 		if (ret > 0)
 			ret = 0;
 		break;
--- linux-2.6.24-rc4-mm1.orig/mm/filemap.c
+++ linux-2.6.24-rc4-mm1/mm/filemap.c
@@ -1242,8 +1242,7 @@ do_readahead(struct address_space *mappi
 	if (!mapping || !mapping->a_ops || !mapping->a_ops->readpage)
 		return -EINVAL;
 
-	force_page_cache_readahead(mapping, filp, index,
-					max_sane_readahead(nr));
+	force_page_cache_readahead(mapping, filp, index, nr);
 	return 0;
 }
 
--- linux-2.6.24-rc4-mm1.orig/mm/madvise.c
+++ linux-2.6.24-rc4-mm1/mm/madvise.c
@@ -123,8 +123,7 @@ static long madvise_willneed(struct vm_a
 		end = vma->vm_end;
 	end = ((end - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
 
-	force_page_cache_readahead(file->f_mapping,
-			file, start, max_sane_readahead(end - start));
+	force_page_cache_readahead(file->f_mapping, file, start, end - start);
 	return 0;
 }
 
--- linux-2.6.24-rc4-mm1.orig/mm/readahead.c
+++ linux-2.6.24-rc4-mm1/mm/readahead.c
@@ -187,6 +187,7 @@ int force_page_cache_readahead(struct ad
 	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
 		return -EINVAL;
 
+	nr_to_read = max_sane_readahead(nr_to_read);
 	while (nr_to_read) {
 		int err;
 

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 9/9] readahead: call max_sane_readahead() in ondemand_readahead()
       [not found] ` <20071216120418.787765636@mail.ustc.edu.cn>
@ 2007-12-16 11:59   ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-16 11:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: readahead-check-max_sane_readahead2.patch --]
[-- Type: text/plain, Size: 747 bytes --]

Apply the max_sane_readahead() limit in ondemand_readahead().
Just in case someone aggressively set a huge readahead size.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 mm/readahead.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.24-rc4-mm1.orig/mm/readahead.c
+++ linux-2.6.24-rc4-mm1/mm/readahead.c
@@ -324,9 +324,9 @@ ondemand_readahead(struct address_space 
 		   bool hit_readahead_marker, pgoff_t offset,
 		   unsigned long req_size)
 {
-	int	max = ra->ra_pages;	/* max readahead pages */
 	pgoff_t prev_offset;
-	int	sequential;
+	int sequential;
+	int max = max_sane_readahead(ra->ra_pages);  /* max readahead pages */
 
 	/*
 	 * It's the expected callback offset, assume sequential access.

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9] mmap read-around and readahead
  2007-12-16 11:59 ` [PATCH 0/9] mmap read-around and readahead Fengguang Wu
@ 2007-12-16 23:35   ` Linus Torvalds
       [not found]     ` <E1J4atl-0000UI-4a@localhost>
                       ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Linus Torvalds @ 2007-12-16 23:35 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Andrew Morton, linux-kernel



On Sun, 16 Dec 2007, Fengguang Wu wrote:
> 
> Here are the mmap read-around related patches initiated by Linus.
> They are for linux-2.6.24-rc4-mm1.  The one major new feature -
> auto detection and early readahead for mmap sequential reads - runs
> as expected on my desktop :-)

Just out of interest - did you check to see if it makes any difference to 
any IO patterns (or even timings)?

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] readahead: clean up and simplify the code for filemap page fault readahead
       [not found] ` <20071216120417.748714367@mail.ustc.edu.cn>
  2007-12-16 11:59   ` [PATCH 2/9] readahead: clean up and simplify the code for filemap page fault readahead Fengguang Wu
@ 2007-12-18  8:19   ` Nick Piggin
       [not found]     ` <E1J4ay1-0000ak-3e@localhost>
  1 sibling, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2007-12-18  8:19 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Andrew Morton, Linus Torvalds, Andrew Morton, linux-kernel

On Sun, Dec 16, 2007 at 07:59:30PM +0800, Fengguang Wu wrote:

> @@ -1321,78 +1401,69 @@ int filemap_fault(struct vm_area_struct 
>  	struct address_space *mapping = file->f_mapping;
>  	struct file_ra_state *ra = &file->f_ra;
>  	struct inode *inode = mapping->host;
> +	pgoff_t offset = vmf->pgoff;
>  	struct page *page;
>  	unsigned long size;
> -	int did_readaround = 0;
>  	int ret = 0;
>  
>  	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
>  	if (vmf->pgoff >= size)
>  		return VM_FAULT_SIGBUS;
>  
> -	/* If we don't want any read-ahead, don't bother */
> -	if (VM_RandomReadHint(vma))
> -		goto no_cached_page;
> -
>  	/*
> -	 * Do we have something in the page cache already?
> +	 * Do we have something in the page cache already that
> +	 * is unlocked and already up-to-date?
>  	 */
> -retry_find:
> -	page = find_lock_page(mapping, vmf->pgoff);
> -	/*
> -	 * For sequential accesses, we use the generic readahead logic.
> -	 */
> -	if (VM_SequentialReadHint(vma)) {
> -		if (!page) {
> -			page_cache_sync_readahead(mapping, ra, file,
> -							   vmf->pgoff, 1);
> -			page = find_lock_page(mapping, vmf->pgoff);
> -			if (!page)
> -				goto no_cached_page;
> -		}
> -		if (PageReadahead(page)) {
> -			page_cache_async_readahead(mapping, ra, file, page,
> -							   vmf->pgoff, 1);
> -		}
> -	}
> +	read_lock_irq(&mapping->tree_lock);
> +	page = radix_tree_lookup(&mapping->page_tree, offset);
> +	if (likely(page)) {
> +		int got_lock, uptodate;
> +		page_cache_get(page);
> +
> +		got_lock = !TestSetPageLocked(page);
> +		uptodate = PageUptodate(page);
> +		read_unlock_irq(&mapping->tree_lock);

If we could avoid open coding tree_lock here (and expanding its coverage
to PageUptodate), that would be nice. I don't think it gains us too much.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9] mmap read-around and readahead
       [not found]     ` <E1J4atl-0000UI-4a@localhost>
@ 2007-12-18 11:46       ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-18 11:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Nick Piggin

On Sun, Dec 16, 2007 at 03:35:58PM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 16 Dec 2007, Fengguang Wu wrote:
> > 
> > Here are the mmap read-around related patches initiated by Linus.
> > They are for linux-2.6.24-rc4-mm1.  The one major new feature -
> > auto detection and early readahead for mmap sequential reads - runs
> > as expected on my desktop :-)
> 
> Just out of interest - did you check to see if it makes any difference to 
> any IO patterns (or even timings)?

No timings for now... but I wrote a debug patch(attached) and watched
it running for about a week.  Here are some interesting numbers:

% grep .so, /var/log/kern.log|grep init0|wc                   
   4085   60806  583895

% grep .so, /var/log/kern.log|grep around|wc
  14438  215265 2107308
% grep .so, /var/log/kern.log|grep around|grep '= 32' | wc    
   3133   46757  462446

% grep .so, /var/log/kern.log|grep interleaved|wc
    997   14866  148921
% grep .so, /var/log/kern.log|grep interleaved|grep '= 0'|wc  
    544    8089   79661
% grep .so, /var/log/kern.log|grep interleaved|grep '= 32'|wc
    179    2683   28233

% grep .so, /var/log/kern.log|grep sequential|wc 
   3499   52275  541319
% grep .so, /var/log/kern.log|grep sequential|grep '= 0' | wc
    915   13598  131953
% grep .so, /var/log/kern.log|grep sequential|grep '= 32' | wc
   1327   19880  212896

That says, there are
   4085 page faults on start-of-lib-file,
  14438 mmap read-around,       22% full ra size
   3499 mmap async readahead,   38% full ra size, or 51% if removing pure cache hits
    997 mmap sync readahead,    18% full ra size, or 40% if removing pure cache hits
That's good numbers: I/O sizes get larger, and possibly less I/O waits :-)

Sure it's rather coarse estimation, but there are some sequential mmap accesses.
E.g.

[11736.998347] readahead-init0(process: sh/23926, file: sda1/w3m, offset=0:-1, ra=0+4-3) = 4
[11737.014985] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=290+32-0) = 17
[11737.019488] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=118+32-0) = 32
[11737.024921] readahead-interleaved(process: w3m/23926, file: sda1/w3m, offset=0:2, ra=4+6-6) = 6
[11737.025726] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:3, ra=10+12-12) = 12
[11737.025794] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:4, ra=90+32-0) = 28
--- sequential begin ---
[11737.037893] readahead-init(process: w3m/23926, file: sda1/w3m, offset=0:149, ra=150+64-32) = 64
[11737.043928] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:181, ra=214+32-32) = 32
[11737.044086] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:213, ra=246+32-32) = 32
[11737.045633] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:245, ra=278+32-32) = 12
[11737.047321] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:277, ra=310+32-32) = 0
--- sequential end ---
[11737.048296] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:119, ra=48+32-0) = 32
[11737.066908] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:63, ra=73+32-0) = 10
[11737.136880] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:116, ra=30+32-0) = 18
[11737.166005] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:37, ra=6+32-0) = 8


But also there is one minor problem.

[16416.600720] readahead-init0(process: zsh/30490, file: sda1/bc, offset=0:-1, ra=0+4-3) = 4
[16416.607967] readahead-around(process: bc/30490, file: sda1/bc, offset=0:0, ra=1+32-0) = 14

The 4-page readahead-init0() hurts performance. It occurs before every initial mmap reads.
A longer example:

wfg ~% dmesg|grep mplayer
[ 1221.454230] readahead-init0(process: mutt/7131, file: md0/mplayer-devel, offset=0:-1, ra=0+4-3) = 4
[ 1378.667305] readahead-init0(process: strace/7352, file: sda1/mplayer, offset=0:-1, ra=0+4-3) = 4
[ 1378.692389] readahead-around(process: mplayer/7352, file: sda1/mplayer, offset=0:0, ra=2212+32-0) = 17
[ 1378.703656] readahead-around(process: mplayer/7352, file: sda1/mplayer, offset=0:0, ra=2061+32-0) = 32
[ 1378.715537] readahead-around(process: mplayer/7352, file: sda1/mplayer, offset=0:2077, ra=0+32-0) = 28
[ 1378.716261] readahead-around(process: mplayer/7352, file: sda1/mplayer, offset=0:10, ra=44+32-0) = 32
[ 1378.727570] readahead-init0(process: mplayer/7352, file: sda1/libdirectfb-0.9.so.25.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.740579] readahead-around(process: mplayer/7352, file: sda1/libdirectfb-0.9.so.25.0.0, offset=0:0, ra=79+32-0) = 17
[ 1378.744826] readahead-around(process: mplayer/7352, file: sda1/libdirectfb-0.9.so.25.0.0, offset=0:1, ra=0+32-0) = 28
[ 1378.749882] readahead-init0(process: mplayer/7352, file: sda1/libXv.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.754546] readahead-around(process: mplayer/7352, file: sda1/libXv.so.1.0.0, offset=0:0, ra=0+32-0) = 1
[ 1378.758057] readahead-init0(process: mplayer/7352, file: sda1/libXvMC.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.759566] readahead-init0(process: mplayer/7352, file: sda1/libXvMCW.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.764991] readahead-init0(process: mplayer/7352, file: sda1/libXxf86dga.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.766036] readahead-around(process: mplayer/7352, file: sda1/libXxf86dga.so.1.0.0, offset=0:0, ra=0+32-0) = 2
[ 1378.766887] readahead-init0(process: mplayer/7352, file: sda1/libGL.so.1.2, offset=0:-1, ra=0+4-3) = 4
[ 1378.778437] readahead-around(process: mplayer/7352, file: sda1/libGL.so.1.2, offset=0:0, ra=109+32-0) = 17
[ 1378.782107] readahead-around(process: mplayer/7352, file: sda1/libGL.so.1.2, offset=0:2, ra=1+32-0) = 29
[ 1378.792935] readahead-init0(process: mplayer/7352, file: sda1/libggi.so.2.0.2, offset=0:-1, ra=0+4-3) = 4
[ 1378.799236] readahead-around(process: mplayer/7352, file: sda1/libggi.so.2.0.2, offset=0:0, ra=132+32-0) = 18
[ 1378.808167] readahead-around(process: mplayer/7352, file: sda1/libggi.so.2.0.2, offset=0:0, ra=0+32-0) = 28
[ 1378.808759] readahead-init0(process: mplayer/7352, file: sda1/libaa.so.1.0.4, offset=0:-1, ra=0+4-3) = 4
[ 1378.818428] readahead-around(process: mplayer/7352, file: sda1/libaa.so.1.0.4, offset=0:0, ra=12+32-0) = 18
[ 1378.830829] readahead-init0(process: mplayer/7352, file: sda1/libcaca.so.0.99.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.832195] readahead-around(process: mplayer/7352, file: sda1/libcaca.so.0.99.0, offset=0:0, ra=0+32-0) = 6
[ 1378.832945] readahead-init0(process: mplayer/7352, file: sda1/libcucul.so.0.99.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.837474] readahead-around(process: mplayer/7352, file: sda1/libcucul.so.0.99.0, offset=0:0, ra=135+32-0) = 18
[ 1378.844951] readahead-around(process: mplayer/7352, file: sda1/libcucul.so.0.99.0, offset=0:151, ra=1+32-0) = 29
[ 1378.845851] readahead-init0(process: mplayer/7352, file: sda1/libSDL-1.2.so.0.11.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.867151] readahead-around(process: mplayer/7352, file: sda1/libSDL-1.2.so.0.11.0, offset=0:0, ra=88+32-0) = 18
[ 1378.871796] readahead-around(process: mplayer/7352, file: sda1/libSDL-1.2.so.0.11.0, offset=0:0, ra=0+32-0) = 28
[ 1378.873248] readahead-init0(process: mplayer/7352, file: sda1/libartsc.so.0.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.885419] readahead-around(process: mplayer/7352, file: sda1/libartsc.so.0.0.0, offset=0:0, ra=0+32-0) = 2
[ 1378.892469] readahead-init0(process: mplayer/7352, file: sda1/libpulse.so.0.2.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.903642] readahead-around(process: mplayer/7352, file: sda1/libpulse.so.0.2.0, offset=0:0, ra=43+32-0) = 17
[ 1378.907206] readahead-around(process: mplayer/7352, file: sda1/libpulse.so.0.2.0, offset=0:1, ra=0+32-0) = 28
[ 1378.918549] readahead-init0(process: mplayer/7352, file: sda1/libjack.so.0.0.23, offset=0:-1, ra=0+4-3) = 4
[ 1378.928575] readahead-around(process: mplayer/7352, file: sda1/libjack.so.0.0.23, offset=0:0, ra=2+32-0) = 16
[ 1378.940046] readahead-init0(process: mplayer/7352, file: sda1/libopenal.so.0.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.963093] readahead-around(process: mplayer/7352, file: sda1/libopenal.so.0.0.0, offset=0:0, ra=42+32-0) = 17
[ 1378.981748] readahead-init0(process: mplayer/7352, file: sda1/libfaac.so.0.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.993281] readahead-around(process: mplayer/7352, file: sda1/libfaac.so.0.0.0, offset=0:0, ra=0+32-0) = 14
[ 1378.994296] readahead-init0(process: mplayer/7352, file: sda1/libx264.so.55, offset=0:-1, ra=0+4-3) = 4
[ 1379.004907] readahead-around(process: mplayer/7352, file: sda1/libx264.so.55, offset=0:0, ra=112+32-0) = 18
[ 1379.010374] readahead-around(process: mplayer/7352, file: sda1/libx264.so.55, offset=0:0, ra=0+32-0) = 28
[ 1379.025175] readahead-init0(process: mplayer/7352, file: sda1/libsmbclient.so.0.1, offset=0:-1, ra=0+4-3) = 4
[ 1379.040139] readahead-around(process: mplayer/7352, file: sda1/libsmbclient.so.0.1, offset=0:0, ra=530+32-0) = 17
[ 1379.043905] readahead-around(process: mplayer/7352, file: sda1/libsmbclient.so.0.1, offset=0:535, ra=0+32-0) = 28
[ 1379.044276] readahead-around(process: mplayer/7352, file: sda1/libsmbclient.so.0.1, offset=0:8, ra=49+32-0) = 32
[ 1379.083560] readahead-init0(process: mplayer/7352, file: sda1/libungif.so.4.1.4, offset=0:-1, ra=0+4-3) = 4
[ 1379.088050] readahead-around(process: mplayer/7352, file: sda1/libungif.so.4.1.4, offset=0:0, ra=0+32-0) = 4
[ 1379.095605] readahead-init0(process: mplayer/7352, file: sda1/libcdda_interface.so.0.10.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.100462] readahead-around(process: mplayer/7352, file: sda1/libcdda_interface.so.0.10.0, offset=0:0, ra=0+32-0) = 12
[ 1379.100889] readahead-init0(process: mplayer/7352, file: sda1/libcdda_paranoia.so.0.10.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.108911] readahead-around(process: mplayer/7352, file: sda1/libcdda_paranoia.so.0.10.0, offset=0:0, ra=0+32-0) = 4
[ 1379.110094] readahead-init0(process: mplayer/7352, file: sda1/libfribidi.so.0.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.111707] readahead-around(process: mplayer/7352, file: sda1/libfribidi.so.0.0.0, offset=0:0, ra=0+32-0) = 11
[ 1379.116159] readahead-init0(process: mplayer/7352, file: sda1/libspeex.so.1.2.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.134065] readahead-around(process: mplayer/7352, file: sda1/libspeex.so.1.2.0, offset=0:0, ra=18+32-0) = 17
[ 1379.137322] readahead-init0(process: mplayer/7352, file: sda1/libtheora.so.0.2.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.137976] readahead-around(process: mplayer/7352, file: sda1/libtheora.so.0.2.0, offset=0:0, ra=33+32-0) = 18
[ 1379.141476] readahead-init0(process: mplayer/7352, file: sda1/libmpcdec.so.3.1.1, offset=0:-1, ra=0+4-3) = 4
[ 1379.150304] readahead-around(process: mplayer/7352, file: sda1/libmpcdec.so.3.1.1, offset=0:0, ra=0+32-0) = 10
[ 1379.151400] readahead-init0(process: mplayer/7352, file: sda1/libamrnb.so.2.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.169518] readahead-around(process: mplayer/7352, file: sda1/libamrnb.so.2.0.0, offset=0:0, ra=44+32-0) = 17
[ 1379.171870] readahead-init0(process: mplayer/7352, file: sda1/libamrwb.so.2.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.172558] readahead-around(process: mplayer/7352, file: sda1/libamrwb.so.2.0.0, offset=0:0, ra=28+32-0) = 17
[ 1379.179794] readahead-init0(process: mplayer/7352, file: sda1/libdv.so.4.0.3, offset=0:-1, ra=0+4-3) = 4
[ 1379.196072] readahead-around(process: mplayer/7352, file: sda1/libdv.so.4.0.3, offset=0:0, ra=13+32-0) = 17
[ 1379.209467] readahead-init0(process: mplayer/7352, file: sda1/libxvidcore.so.4.1, offset=0:-1, ra=0+4-3) = 4
[ 1379.210581] readahead-around(process: mplayer/7352, file: sda1/libxvidcore.so.4.1, offset=0:0, ra=115+32-0) = 18
[ 1379.225045] readahead-init0(process: mplayer/7352, file: sda1/liblirc_client.so.0.1.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.229523] readahead-around(process: mplayer/7352, file: sda1/liblirc_client.so.0.1.0, offset=0:0, ra=0+32-0) = 2
[ 1379.230907] readahead-init0(process: mplayer/7352, file: sda1/libdirect-0.9.so.25.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.237679] readahead-around(process: mplayer/7352, file: sda1/libdirect-0.9.so.25.0.0, offset=0:0, ra=0+32-0) = 12
[ 1379.238163] readahead-init0(process: mplayer/7352, file: sda1/libfusion-0.9.so.25.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.245010] readahead-around(process: mplayer/7352, file: sda1/libfusion-0.9.so.25.0.0, offset=0:0, ra=0+32-0) = 3
[ 1379.246950] readahead-init0(process: mplayer/7352, file: sda1/libXxf86vm.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.255703] readahead-around(process: mplayer/7352, file: sda1/libXxf86vm.so.1.0.0, offset=0:0, ra=0+32-0) = 1

There are so many readahead-init0() calls... because ld-linux.so will
do a read(0+832) before doing mmap(in L1):

L0: open("/lib/libc.so.6", O_RDONLY)        = 3
L1: read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\342"..., 832) = 832
L2: fstat(3, {st_mode=S_IFREG|0755, st_size=1420624, ...}) = 0
L3: mmap(NULL, 3527256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac6e51d000
L4: mprotect(0x7fac6e671000, 2097152, PROT_NONE) = 0
L5: mmap(0x7fac6e871000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x154000) = 0x7fac6e871000
L6: mmap(0x7fac6e876000, 16984, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac6e876000
L7: close(3)                                = 0


I cannot think of a good solution to it. Teaching ld-linux.so to blindly
do a fadvise(128KB) looks bad. And the kernel can do little about it.

This is also the major reason I disabled the interleaved readahead
support for mmap reads. Otherwise the PG_readahead flag leaved by
ld-linux.so will trigger _small_ interleaved readahead like this:

readahead-interleaved(process: firefox-bin/4596, file: sda1/libmozjs.so, offset=0, ra=4+6-6) = 6

It would be a much larger read-around if we don't do that readahead ;-)

Fengguang


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] readahead: clean up and simplify the code for filemap page fault readahead
       [not found]     ` <E1J4ay1-0000ak-3e@localhost>
@ 2007-12-18 11:50       ` Fengguang Wu
  2007-12-18 23:54       ` Nick Piggin
  1 sibling, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-18 11:50 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Linus Torvalds, Andrew Morton, linux-kernel

On Tue, Dec 18, 2007 at 09:19:07AM +0100, Nick Piggin wrote:
> On Sun, Dec 16, 2007 at 07:59:30PM +0800, Fengguang Wu wrote:

> > +	read_lock_irq(&mapping->tree_lock);
> > +	page = radix_tree_lookup(&mapping->page_tree, offset);
> > +	if (likely(page)) {
> > +		int got_lock, uptodate;
> > +		page_cache_get(page);
> > +
> > +		got_lock = !TestSetPageLocked(page);
> > +		uptodate = PageUptodate(page);
> > +		read_unlock_irq(&mapping->tree_lock);
> 
> If we could avoid open coding tree_lock here (and expanding its coverage
> to PageUptodate), that would be nice. I don't think it gains us too much.

To use find_get_page()? That would be nice to me, too.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9] mmap read-around and readahead
       [not found]       ` <E1J4bK6-0001BG-Mp@localhost>
@ 2007-12-18 12:13         ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-18 12:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Nick Piggin

On Tue, Dec 18, 2007 at 07:46:09PM +0800, Fengguang Wu wrote:
> No timings for now... but I wrote a debug patch(attached) and watched
> it running for about a week.  Here are some interesting numbers:

Here are the (forgotten) readahead-debug.patch:

---
 include/linux/fs.h |   43 ++++++++++++++++++++++++++++++++++
 mm/Kconfig         |   19 +++++++++++++++
 mm/filemap.c       |    1 
 mm/readahead.c     |   54 ++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 116 insertions(+), 1 deletion(-)

--- linux-2.6.24-rc4-mm1.orig/include/linux/fs.h
+++ linux-2.6.24-rc4-mm1/include/linux/fs.h
@@ -760,11 +760,54 @@ struct file_ra_state {
 	unsigned int async_size;	/* do asynchronous readahead when
 					   there are only # of pages ahead */
 
+	unsigned int flags;
 	unsigned int ra_pages;		/* Maximum readahead window */
 	int mmap_miss;			/* Cache miss stat for mmap accesses */
 	loff_t prev_pos;		/* Cache last read() position */
 };
 
+#define RA_CLASS_SHIFT		4
+#define RA_CLASS_MASK		((1 << RA_CLASS_SHIFT) - 1)
+/*
+ * Detailed classification of read-ahead behaviors.
+ */
+enum ra_class {
+	RA_CLASS_INIT0,
+	RA_CLASS_INIT,
+	RA_CLASS_SEQUENTIAL,
+	RA_CLASS_INTERLEAVED,
+	RA_CLASS_CONTEXT,
+	RA_CLASS_AROUND,
+	RA_CLASS_COUNT
+};
+
+static inline enum ra_class ra_class_new(struct file_ra_state *ra)
+{
+	return ra->flags & RA_CLASS_MASK;
+}
+
+static inline enum ra_class ra_class_old(struct file_ra_state *ra)
+{
+	return (ra->flags >> RA_CLASS_SHIFT) & RA_CLASS_MASK;
+}
+
+/*
+ * Which method is issuing this read-ahead?
+ */
+static inline void ra_set_class(struct file_ra_state *ra, enum ra_class ra_class)
+{
+	unsigned long flags_mask;
+	unsigned long flags;
+	unsigned long old_ra_class;
+
+	flags_mask = ~(RA_CLASS_MASK | (RA_CLASS_MASK << RA_CLASS_SHIFT));
+	flags = ra->flags & flags_mask;
+
+	old_ra_class = ra_class_new(ra) << RA_CLASS_SHIFT;
+
+	ra->flags = flags | old_ra_class | ra_class;
+}
+
 /*
  * Check if @index falls in the readahead windows.
  */
--- linux-2.6.24-rc4-mm1.orig/mm/Kconfig
+++ linux-2.6.24-rc4-mm1/mm/Kconfig
@@ -194,3 +194,22 @@ config NR_QUICK
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config DEBUG_READAHEAD
+	bool "Readahead debug and accounting"
+	default y
+	select DEBUG_FS
+	help
+	  This option injects extra code to dump detailed debug traces and do
+	  readahead events accounting.
+
+	  To actually get the data:
+
+	  mkdir /debug
+	  mount -t debug none /debug
+
+	  After that you can do the following:
+
+	  echo > /debug/readahead/events # reset the counters
+	  cat /debug/readahead/events    # check the counters
+
--- linux-2.6.24-rc4-mm1.orig/mm/readahead.c
+++ linux-2.6.24-rc4-mm1/mm/readahead.c
@@ -16,6 +16,29 @@
 #include <linux/task_io_accounting_ops.h>
 #include <linux/pagevec.h>
 #include <linux/pagemap.h>
+#include <linux/debugfs.h>
+
+static const char * const ra_class_name[] = {
+	[RA_CLASS_INIT0]	= "init0",
+	[RA_CLASS_INIT]		= "init",
+	[RA_CLASS_SEQUENTIAL]	= "sequential",
+	[RA_CLASS_INTERLEAVED]	= "interleaved",
+	[RA_CLASS_CONTEXT]	= "context",
+	[RA_CLASS_AROUND]	= "around",
+};
+
+#ifdef CONFIG_DEBUG_READAHEAD
+static u32 readahead_debug_level = 1;
+#  define debug_option(o)		(o)
+#else
+#  define debug_option(o)		(0)
+#  define readahead_debug_level 	(0)
+#endif /* CONFIG_DEBUG_READAHEAD */
+
+#define dprintk(args...) \
+	do { if (readahead_debug_level >= 2) printk(KERN_DEBUG args); } while(0)
+#define ddprintk(args...) \
+	do { if (readahead_debug_level >= 3) printk(KERN_DEBUG args); } while(0)
 
 void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
 {
@@ -220,6 +243,13 @@ unsigned long max_sane_readahead(unsigne
 
 static int __init readahead_init(void)
 {
+#ifdef CONFIG_DEBUG_READAHEAD
+	struct dentry *root;
+
+	root = debugfs_create_dir("readahead", NULL);
+
+	debugfs_create_u32("debug_level", 0644, root, &readahead_debug_level);
+#endif
 	return bdi_init(&default_backing_dev_info);
 }
 subsys_initcall(readahead_init);
@@ -235,6 +265,15 @@ unsigned long ra_submit(struct file_ra_s
 	actual = __do_page_cache_readahead(mapping, filp,
 					ra->start, ra->size, ra->async_size);
 
+	dprintk("readahead-%s(process: %s/%d, file: %s/%s, "
+			"offset=%ld:%ld, ra=%ld+%d-%d) = %d\n",
+			ra_class_name[ra_class_new(ra)],
+			current->comm, current->pid,
+			mapping->host->i_sb->s_id,
+			filp->f_path.dentry->d_iname,
+			(long)(filp->f_pos >> PAGE_CACHE_SHIFT),
+			(long)(ra->prev_pos >> PAGE_CACHE_SHIFT),
+			ra->start, ra->size, ra->async_size, actual);
 	return actual;
 }
 
@@ -337,6 +376,7 @@ ondemand_readahead(struct address_space 
 		ra->start += ra->size;
 		ra->size = get_next_ra_size(ra, max);
 		ra->async_size = ra->size;
+		ra_set_class(ra, RA_CLASS_SEQUENTIAL);
 		goto readit;
 	}
 
@@ -348,8 +388,15 @@ ondemand_readahead(struct address_space 
 	 * Read as is, and do not pollute the readahead state.
 	 */
 	if (!hit_readahead_marker && !sequential) {
-		return __do_page_cache_readahead(mapping, filp,
+		int actual = __do_page_cache_readahead(mapping, filp,
 						offset, req_size, 0);
+		dprintk("read-random(process: %s/%d, file: %s/%s, "
+			"req=%ld+%ld) = %d\n",
+				current->comm, current->pid,
+				mapping->host->i_sb->s_id,
+				filp->f_path.dentry->d_iname,
+				offset, req_size, actual);
+		return actual;
 	}
 
 	/*
@@ -372,6 +419,7 @@ ondemand_readahead(struct address_space 
 		ra->size = start - offset;	/* old async_size */
 		ra->size = get_next_ra_size(ra, max);
 		ra->async_size = ra->size;
+		ra_set_class(ra, RA_CLASS_INTERLEAVED);
 		goto readit;
 	}
 
@@ -385,6 +433,10 @@ ondemand_readahead(struct address_space 
 	ra->start = offset;
 	ra->size = get_init_ra_size(req_size, max);
 	ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
+	if (offset)
+		ra_set_class(ra, RA_CLASS_INIT);
+	else
+		ra_set_class(ra, RA_CLASS_INIT0);
 
 readit:
 	/*
--- linux-2.6.24-rc4-mm1.orig/mm/filemap.c
+++ linux-2.6.24-rc4-mm1/mm/filemap.c
@@ -1340,6 +1340,7 @@ static void do_sync_mmap_readahead(struc
 		ra->start = max_t(long, 0, offset - ra_pages / 2);
 		ra->size = ra_pages;
 		ra->async_size = 0;
+		ra_set_class(ra, RA_CLASS_AROUND);
 		ra_submit(ra, mapping, file);
 	}
 }


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] readahead: clean up and simplify the code for filemap page fault readahead
       [not found]     ` <E1J4ay1-0000ak-3e@localhost>
  2007-12-18 11:50       ` Fengguang Wu
@ 2007-12-18 23:54       ` Nick Piggin
       [not found]         ` <E1J4spp-0001pR-JQ@localhost>
  1 sibling, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2007-12-18 23:54 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Andrew Morton, Linus Torvalds, Andrew Morton, linux-kernel

On Tue, Dec 18, 2007 at 07:50:33PM +0800, Fengguang Wu wrote:
> On Tue, Dec 18, 2007 at 09:19:07AM +0100, Nick Piggin wrote:
> > On Sun, Dec 16, 2007 at 07:59:30PM +0800, Fengguang Wu wrote:
> 
> > > +	read_lock_irq(&mapping->tree_lock);
> > > +	page = radix_tree_lookup(&mapping->page_tree, offset);
> > > +	if (likely(page)) {
> > > +		int got_lock, uptodate;
> > > +		page_cache_get(page);
> > > +
> > > +		got_lock = !TestSetPageLocked(page);
> > > +		uptodate = PageUptodate(page);
> > > +		read_unlock_irq(&mapping->tree_lock);
> > 
> > If we could avoid open coding tree_lock here (and expanding its coverage
> > to PageUptodate), that would be nice. I don't think it gains us too much.
> 
> To use find_get_page()? That would be nice to me, too.

Exactly.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] readahead: clean up and simplify the code for filemap page fault readahead
       [not found]         ` <E1J4spp-0001pR-JQ@localhost>
@ 2007-12-19  6:55           ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-19  6:55 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Linus Torvalds, Andrew Morton, linux-kernel

On Wed, Dec 19, 2007 at 12:54:23AM +0100, Nick Piggin wrote:
> On Tue, Dec 18, 2007 at 07:50:33PM +0800, Fengguang Wu wrote:
> > On Tue, Dec 18, 2007 at 09:19:07AM +0100, Nick Piggin wrote:
> > > On Sun, Dec 16, 2007 at 07:59:30PM +0800, Fengguang Wu wrote:
> > 
> > > > +	read_lock_irq(&mapping->tree_lock);
> > > > +	page = radix_tree_lookup(&mapping->page_tree, offset);
> > > > +	if (likely(page)) {
> > > > +		int got_lock, uptodate;
> > > > +		page_cache_get(page);
> > > > +
> > > > +		got_lock = !TestSetPageLocked(page);
> > > > +		uptodate = PageUptodate(page);
> > > > +		read_unlock_irq(&mapping->tree_lock);
> > > 
> > > If we could avoid open coding tree_lock here (and expanding its coverage
> > > to PageUptodate), that would be nice. I don't think it gains us too much.
> > 
> > To use find_get_page()? That would be nice to me, too.
> 
> Exactly.

Done. Also I think it's better not to split the `found' case into
the two cases of 'found & lock-ok' and 'found & wait-lock' for now.
That would reduce some more code:

-/*
- * A successful mmap hit is when we didn't need any IO at all,
- * and got an immediate lock on an up-to-date page. There's not
- * much to do, except decide on whether we want to trigger read-
- * ahead.
- *
- * We currently do the same thing as we did for a locked page
- * that we're waiting for.
- */
-static void do_mmap_hit(struct vm_area_struct *vma,
-			struct file_ra_state *ra,
-			struct file *file,
-			struct page *page,
-			pgoff_t offset)
-{
-	do_async_mmap_readahead(vma, ra, file, page, offset);
-}
-
 /**
  * filemap_fault - read in file data for page fault handling
  * @vma:	vma in which the fault was taken
@@ -1411,28 +1396,13 @@
 		return VM_FAULT_SIGBUS;
 
 	/*
-	 * Do we have something in the page cache already that
-	 * is unlocked and already up-to-date?
+	 * Do we have something in the page cache already?
 	 */
 	page = find_get_page(mapping, offset);
 	if (likely(page)) {
-		if (likely(!TestSetPageLocked(page))) {
-			/*
-			 * Previous IO error? No read-ahead, but try to
-			 * re-do a single read.
-			 */
-			if (unlikely(!PageUptodate(page)))
-				goto page_not_uptodate;
-
-			do_mmap_hit(vma, ra, file, page, offset);
-			goto found_it;
-		}
-
 		/*
-		 * We found the page, but it was locked..
-		 *
-		 * So do async readahead and wait for it to
-		 * unlock.
+		 * We found the page, so try async readahead before
+		 * waiting for the lock.
 		 */
 		do_async_mmap_readahead(vma, ra, file, page, offset);
 		lock_page(page);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9] mmap read-around and readahead
       [not found]     ` <E1J4tUN-0002IB-F6@localhost>
@ 2007-12-19  7:37       ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-19  7:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Nick Piggin

On Sun, Dec 16, 2007 at 03:35:58PM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 16 Dec 2007, Fengguang Wu wrote:
> > 
> > Here are the mmap read-around related patches initiated by Linus.
> > They are for linux-2.6.24-rc4-mm1.  The one major new feature -
> > auto detection and early readahead for mmap sequential reads - runs
> > as expected on my desktop :-)
> 
> Just out of interest - did you check to see if it makes any difference to 
> any IO patterns (or even timings)?

Now I have some numbers on 100,000 sequential mmap reads:

                                    user       system    cpu        total
(1-1)  plain -mm, 128KB readaround: 3.224      2.554     48.40%     11.838
(1-2)  plain -mm, 256KB readaround: 3.170      2.392     46.20%     11.976
(2)  patched -mm, 128KB readahead:  3.117      2.448     47.33%     11.607

The patched (2) has smallest total time. It has no cache hit overheads
and less I/O block time(thanks to async readahead). Here the I/O size
makes no much difference, since there's only one single stream.

Note that (1-1)'s real I/O size is 64KB and (1-2)'s real I/O size is
128KB, since the half of the read-around pages will be cache hits.

Fengguang
---

PS. raw time numbers:

        1) linux-2.6.24-rc5-mm1, 128KB read_ahead_kb:

                3.27s user 2.62s system 50% cpu 11.730 total
                3.25s user 2.65s system 49% cpu 11.816 total
                3.07s user 2.62s system 47% cpu 11.911 total
                3.32s user 2.42s system 48% cpu 11.948 total
                3.21s user 2.46s system 48% cpu 11.787 total

        2) linux-2.6.24-rc5-mm1, 256KB read_ahead_kb:

                3.00s user 2.46s system 45% cpu 12.077 total
                3.41s user 2.51s system 49% cpu 12.038 total
                3.25s user 2.34s system 47% cpu 11.889 total
                3.13s user 2.33s system 45% cpu 11.922 total
                3.06s user 2.32s system 45% cpu 11.952 total

        3) linux-2.6.24-rc5-mm1 + this patchset, 128KB read_ahead_kb:

                2.79s user 2.26s system 43% cpu 11.515 total
                3.19s user 2.21s system 46% cpu 11.563 total
                3.28s user 2.51s system 49% cpu 11.596 total
                3.22s user 2.75s system 51% cpu 11.687 total
                3.08s user 2.58s system 48% cpu 11.643 total
                3.14s user 2.38s system 47% cpu 11.637 total


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/9] readahead: auto detection of sequential mmap reads
       [not found] ` <20071222013314.837299924@mail.ustc.edu.cn>
@ 2007-12-22  1:31   ` Fengguang Wu
  0 siblings, 0 replies; 19+ messages in thread
From: Fengguang Wu @ 2007-12-22  1:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, Nick Piggin, linux-kernel

[-- Attachment #1: readahead-auto-detect-mmap-sequential-reads.patch --]
[-- Type: text/plain, Size: 2138 bytes --]

Auto-detect sequential mmap reads and do sync/async readahead for them.

The sequential mmap readahead will be triggered when
- sync readahead: it's a major fault and (prev_offset==offset-1);
- async readahead: minor fault on PG_readahead page with valid readahead state.

It's a bit conservative to require valid readahead state for async readahead,
which means we don't do readahead for interleaved reads for now, but let's make
it safe for this initial try.

======
The benefits of doing readahead instead of read-around:
- less I/O wait thanks to async readahead
- double real I/O size and no more cache hits

Some numbers on 100,000 sequential mmap reads:

                                    user       system    cpu        total
(1-1)  plain -mm, 128KB readaround: 3.224      2.554     48.40%     11.838
(1-2)  plain -mm, 256KB readaround: 3.170      2.392     46.20%     11.976
(2)  patched -mm, 128KB readahead:  3.117      2.448     47.33%     11.607

The patched (2) has smallest total time. It has no cache hit overheads
and less I/O block time(thanks to async readahead). Here the I/O size
makes no much difference, since there's only one single stream.

Note that (1-1)'s real I/O size is 64KB and (1-2)'s real I/O size is
128KB, since the half of the read-around pages will be cache hits.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---

---
 mm/filemap.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- linux-2.6.24-rc5-mm1.orig/mm/filemap.c
+++ linux-2.6.24-rc5-mm1/mm/filemap.c
@@ -1318,7 +1318,8 @@ static void do_sync_mmap_readahead(struc
 	if (VM_RandomReadHint(vma))
 		return;
 
-	if (VM_SequentialReadHint(vma)) {
+	if (VM_SequentialReadHint(vma) ||
+			offset - 1 == (ra->prev_pos >> PAGE_CACHE_SHIFT)) {
 		page_cache_sync_readahead(mapping, ra, file, offset, 1);
 		return;
 	}
@@ -1360,7 +1361,8 @@ static void do_async_mmap_readahead(stru
 		return;
 	if (ra->mmap_miss > 0)
 		ra->mmap_miss--;
-	if (PageReadahead(page))
+	if (PageReadahead(page) &&
+			offset == ra->start + ra->size - ra->async_size)
 		page_cache_async_readahead(mapping, ra, file, page, offset, 1);
 }
 

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-12-22  1:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20071216115927.986126305@mail.ustc.edu.cn>
2007-12-16 11:59 ` [PATCH 0/9] mmap read-around and readahead Fengguang Wu
2007-12-16 23:35   ` Linus Torvalds
     [not found]     ` <E1J4atl-0000UI-4a@localhost>
2007-12-18 11:46       ` Fengguang Wu
     [not found]     ` <20071218114609.GA27778@mail.ustc.edu.cn>
     [not found]       ` <E1J4bK6-0001BG-Mp@localhost>
2007-12-18 12:13         ` Fengguang Wu
     [not found]     ` <E1J4tUN-0002IB-F6@localhost>
2007-12-19  7:37       ` Fengguang Wu
     [not found] ` <20071216120417.586021813@mail.ustc.edu.cn>
2007-12-16 11:59   ` [PATCH 1/9] readahead: simplify readahead call scheme Fengguang Wu
     [not found] ` <20071216120417.748714367@mail.ustc.edu.cn>
2007-12-16 11:59   ` [PATCH 2/9] readahead: clean up and simplify the code for filemap page fault readahead Fengguang Wu
2007-12-18  8:19   ` Nick Piggin
     [not found]     ` <E1J4ay1-0000ak-3e@localhost>
2007-12-18 11:50       ` Fengguang Wu
2007-12-18 23:54       ` Nick Piggin
     [not found]         ` <E1J4spp-0001pR-JQ@localhost>
2007-12-19  6:55           ` Fengguang Wu
     [not found] ` <20071216120417.905514988@mail.ustc.edu.cn>
2007-12-16 11:59   ` [PATCH 3/9] readahead: auto detection of sequential mmap reads Fengguang Wu
     [not found] ` <20071216120418.055796608@mail.ustc.edu.cn>
2007-12-16 11:59   ` [PATCH 4/9] readahead: quick startup on sequential mmap readahead Fengguang Wu
     [not found] ` <20071216120418.201213716@mail.ustc.edu.cn>
2007-12-16 11:59   ` [PATCH 5/9] readahead: make ra_submit() non-static Fengguang Wu
     [not found] ` <20071216120418.360445517@mail.ustc.edu.cn>
2007-12-16 11:59   ` [PATCH 6/9] readahead: save mmap read-around states in file_ra_state Fengguang Wu
     [not found] ` <20071216120418.498110756@mail.ustc.edu.cn>
2007-12-16 11:59   ` [PATCH 7/9] readahead: remove unused do_page_cache_readahead() Fengguang Wu
     [not found] ` <20071216120418.639781633@mail.ustc.edu.cn>
2007-12-16 11:59   ` [PATCH 8/9] readahead: move max_sane_readahead() calls into force_page_cache_readahead() Fengguang Wu
     [not found] ` <20071216120418.787765636@mail.ustc.edu.cn>
2007-12-16 11:59   ` [PATCH 9/9] readahead: call max_sane_readahead() in ondemand_readahead() Fengguang Wu
     [not found] <20071222013147.897522982@mail.ustc.edu.cn>
     [not found] ` <20071222013314.837299924@mail.ustc.edu.cn>
2007-12-22  1:31   ` [PATCH 3/9] readahead: auto detection of sequential mmap reads Fengguang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).