All of lore.kernel.org
 help / color / mirror / Atom feed
* Setting of the PageReadahed bit
@ 2011-06-03 11:55 Matthew Wilcox
  2011-06-04  3:15 ` Hugh Dickins
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2011-06-03 11:55 UTC (permalink / raw)
  To: linux-mm

The exact definition of PageReadahead doesn't seem to be documented
anywhere.  I'm assuming it means "This page was not directly requested;
it is being read for prefetching purposes", exactly like the READA
semantics.

If my interpretation is correct, then the implementation in 
__do_page_cache_readahead is wrong:

		if (page_idx == nr_to_read - lookahead_size)
			SetPageReadahead(page);

It'll only set the PageReadahead bit on one page.  The patch below fixes
this ... if my understanding is correct.

If my understanding is wrong, then how are readpage/readpages
implementations supposed to know that the VM is only prefetching these
pages, and they're not as important as metadata (dependent) reads?

commit 7b1a00ae0ecc327bab9ce467dcd7bd7fe2a31fab
Author: Matthew Wilcox <matthew.r.wilcox@intel.com>
Date:   Fri Jun 3 03:49:45 2011 -0400

    mm: Fix PageReadahead flag setting in readahead code
    
    The current code sets the PageReadahead bit on at most one page in each
    batch.  Worse, it can set the PageReadahead bit on the exact page that
    was requested.  What it should be doing is setting the PageReadahead
    bit on every page except the one which is really demanded.
    
    Passing the page offset to the __do_page_cache_readahead() function
    lets it know which page to not set the Readahead bit on.  That implies
    passing the offset into ra_submit as well.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..3e32a17 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1480,6 +1480,7 @@ void page_cache_async_readahead(struct address_space *mapping,
 
 unsigned long max_sane_readahead(unsigned long nr);
 unsigned long ra_submit(struct file_ra_state *ra,
+			pgoff_t offset,
 			struct address_space *mapping,
 			struct file *filp);
 
diff --git a/mm/filemap.c b/mm/filemap.c
index d7b1057..1a1ab1b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1594,7 +1594,7 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma,
 	ra->start = max_t(long, 0, offset - ra_pages / 2);
 	ra->size = ra_pages;
 	ra->async_size = ra_pages / 4;
-	ra_submit(ra, mapping, file);
+	ra_submit(ra, offset, mapping, file);
 }
 
 /*
diff --git a/mm/readahead.c b/mm/readahead.c
index 867f9dd..9202533 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -150,7 +150,7 @@ out:
 static int
 __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 			pgoff_t offset, unsigned long nr_to_read,
-			unsigned long lookahead_size)
+			pgoff_t wanted)
 {
 	struct inode *inode = mapping->host;
 	struct page *page;
@@ -185,7 +185,7 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 			break;
 		page->index = page_offset;
 		list_add(&page->lru, &page_pool);
-		if (page_idx == nr_to_read - lookahead_size)
+		if (page_offset != wanted)
 			SetPageReadahead(page);
 		ret++;
 	}
@@ -210,6 +210,7 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
 		pgoff_t offset, unsigned long nr_to_read)
 {
 	int ret = 0;
+	pgoff_t wanted = offset;
 
 	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
 		return -EINVAL;
@@ -223,7 +224,7 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
 		if (this_chunk > nr_to_read)
 			this_chunk = nr_to_read;
 		err = __do_page_cache_readahead(mapping, filp,
-						offset, this_chunk, 0);
+						offset, this_chunk, wanted);
 		if (err < 0) {
 			ret = err;
 			break;
@@ -248,13 +249,13 @@ unsigned long max_sane_readahead(unsigned long nr)
 /*
  * Submit IO for the read-ahead request in file_ra_state.
  */
-unsigned long ra_submit(struct file_ra_state *ra,
+unsigned long ra_submit(struct file_ra_state *ra, pgoff_t offset,
 		       struct address_space *mapping, struct file *filp)
 {
 	int actual;
 
 	actual = __do_page_cache_readahead(mapping, filp,
-					ra->start, ra->size, ra->async_size);
+					ra->start, ra->size, offset);
 
 	return actual;
 }
@@ -465,7 +466,8 @@ ondemand_readahead(struct address_space *mapping,
 	 * standalone, small random read
 	 * Read as is, and do not pollute the readahead state.
 	 */
-	return __do_page_cache_readahead(mapping, filp, offset, req_size, 0);
+	return __do_page_cache_readahead(mapping, filp, offset, req_size,
+								offset);
 
 initial_readahead:
 	ra->start = offset;
@@ -483,7 +485,7 @@ readit:
 		ra->size += ra->async_size;
 	}
 
-	return ra_submit(ra, mapping, filp);
+	return ra_submit(ra, offset, mapping, filp);
 }
 
 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Setting of the PageReadahed bit
  2011-06-03 11:55 Setting of the PageReadahed bit Matthew Wilcox
@ 2011-06-04  3:15 ` Hugh Dickins
  2011-06-05  7:54   ` Wu Fengguang
  0 siblings, 1 reply; 3+ messages in thread
From: Hugh Dickins @ 2011-06-04  3:15 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Wu Fengguang

On Fri, Jun 3, 2011 at 4:55 AM, Matthew Wilcox <willy@linux.intel.com> wrote:
> The exact definition of PageReadahead doesn't seem to be documented
> anywhere.  I'm assuming it means "This page was not directly requested;
> it is being read for prefetching purposes", exactly like the READA
> semantics.
>
> If my interpretation is correct, then the implementation in
> __do_page_cache_readahead is wrong:
>
>                if (page_idx == nr_to_read - lookahead_size)
>                        SetPageReadahead(page);
>
> It'll only set the PageReadahead bit on one page.  The patch below fixes
> this ... if my understanding is correct.

Incorrect I believe: it's a trigger to say, when you get this far,
it's time to think about kicking off the next read.

>
> If my understanding is wrong, then how are readpage/readpages
> implementations supposed to know that the VM is only prefetching these
> pages, and they're not as important as metadata (dependent) reads?

I don't think they do know at present; but I can well imagine there
may be advantage in them knowing.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Setting of the PageReadahed bit
  2011-06-04  3:15 ` Hugh Dickins
@ 2011-06-05  7:54   ` Wu Fengguang
  0 siblings, 0 replies; 3+ messages in thread
From: Wu Fengguang @ 2011-06-05  7:54 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Matthew Wilcox, linux-mm

On Sat, Jun 04, 2011 at 11:15:38AM +0800, Hugh Dickins wrote:
> On Fri, Jun 3, 2011 at 4:55 AM, Matthew Wilcox <willy@linux.intel.com> wrote:
> > The exact definition of PageReadahead doesn't seem to be documented
> > anywhere. A I'm assuming it means "This page was not directly requested;
> > it is being read for prefetching purposes", exactly like the READA
> > semantics.
> >
> > If my interpretation is correct, then the implementation in
> > __do_page_cache_readahead is wrong:
> >
> > A  A  A  A  A  A  A  A if (page_idx == nr_to_read - lookahead_size)
> > A  A  A  A  A  A  A  A  A  A  A  A SetPageReadahead(page);
> >
> > It'll only set the PageReadahead bit on one page. A The patch below fixes
> > this ... if my understanding is correct.
> 
> Incorrect I believe: it's a trigger to say, when you get this far,
> it's time to think about kicking off the next read.

That's right. PG_readahead is set to trigger the _next_ ASYNC readahead.

> >
> > If my understanding is wrong, then how are readpage/readpages
> > implementations supposed to know that the VM is only prefetching these
> > pages, and they're not as important as metadata (dependent) reads?
> 
> I don't think they do know at present; but I can well imagine there
> may be advantage in them knowing.

__do_page_cache_readahead() don't know whether the _current_ readahead
IO is an ASYNC one.

page_cache_async_readahead() calls ondemand_readahead() with
hit_readahead_marker=true. It's possible to further pass this
information into __do_page_cache_readahead() and ->readpage/readpages.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-06-05  7:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-03 11:55 Setting of the PageReadahed bit Matthew Wilcox
2011-06-04  3:15 ` Hugh Dickins
2011-06-05  7:54   ` Wu Fengguang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.