All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Adaptive readahead updates 3
@ 2006-06-25 13:07 ` Wu Fengguang
  0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2006-06-25 13:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew,

Most patches here are focused on seperating out readahead overheads.

- apply after readahead-kconfig-options.patch
[PATCH 3/6] readahead: kconfig option READAHEAD_ALLOW_OVERHEADS
[PATCH 4/6] readahead: kconfig option READAHEAD_SMOOTH_AGING

- after readahead-context-based-method-fix-remain-counting.patch
[PATCH 1/6] readahead: context based method - slow start

- after readahead-backward-prefetching-method-add-use-case-comment.patch
[PATCH 2/6] readahead: backward prefetching method fix

- after readahead-call-scheme-no-fastcall-for-readahead_cache_hit.patch
[PATCH 5/6] readahead: kconfig option READAHEAD_HIT_FEEDBACK

- after readahead-remove-size-limit-on-read_ahead_kb.patch
[PATCH 6/6] readahead: remove the size limit of max_sectors_kb on read_ahead_kb


Thanks,
Fengguang Wu
-
Dept. Automation                University of Science and Technology of China

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/6] readahead: context based method - slow start
@ 2006-06-25 13:07   ` Wu Fengguang
  0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2006-06-25 13:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Wu Fengguang

[-- Attachment #1: readahead-context-fix.patch --]
[-- Type: text/plain, Size: 716 bytes --]

The context method will lead to noticable overhead on sparse random reads.
Having the readahead window to start slowly makes it much better.

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---


--- linux-2.6.17-mm2.orig/mm/readahead.c
+++ linux-2.6.17-mm2/mm/readahead.c
@@ -1548,6 +1548,11 @@ try_context_based_readahead(struct addre
 			return -1;
 	} else if (prev_page || probe_page(mapping, index - 1)) {
 		ra_index = index;
+		/*
+		 * Slow start of readahead window.
+		 * It helps avoid most readahead miss on sparse random reads.
+		 */
+		ra_min = readahead_hit_rate;
 	} else if (readahead_hit_rate > 1) {
 		ra_index = find_segtail_backward(mapping, index,
 						readahead_hit_rate + ra_min);

--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/6] readahead: backward prefetching method fix
@ 2006-06-25 13:07   ` Wu Fengguang
  0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2006-06-25 13:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Wu Fengguang

[-- Attachment #1: readahead-backward-prefetching-fix.patch --]
[-- Type: text/plain, Size: 1870 bytes --]

- The backward prefetching method fails near start of file. Fix it.
- Make it scale up more quickly by adding ra_min to ra_size.
- Do not discount readahead_hit_rate, that's not a documented behavior.

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---


--- linux-2.6.17-mm2.orig/mm/readahead.c
+++ linux-2.6.17-mm2/mm/readahead.c
@@ -1636,8 +1636,9 @@ initial_readahead(struct address_space *
  * Important for certain scientific arenas(i.e. structural analysis).
  */
 static int
-try_read_backward(struct file_ra_state *ra, pgoff_t begin_index,
-			unsigned long ra_size, unsigned long ra_max)
+try_read_backward(struct file_ra_state *ra,
+			pgoff_t begin_index, unsigned long ra_size,
+			unsigned long ra_min, unsigned long ra_max)
 {
 	pgoff_t end_index;
 
@@ -1646,11 +1647,11 @@ try_read_backward(struct file_ra_state *
 		return 0;
 
 	if ((ra->flags & RA_CLASS_MASK) == RA_CLASS_BACKWARD &&
-					ra_has_index(ra, ra->prev_page)) {
-		ra_size += 2 * ra->hit0;
+		ra_has_index(ra, ra->prev_page) && ra_cache_hit_ok(ra)) {
+		ra_size += ra_min + 2 * ra_readahead_size(ra);
 		end_index = ra->la_index;
 	} else {
-		ra_size += ra_size + ra_size * (readahead_hit_rate - 1) / 2;
+		ra_size += ra_size * readahead_hit_rate;
 		end_index = ra->prev_page;
 	}
 
@@ -1661,7 +1662,7 @@ try_read_backward(struct file_ra_state *
 	if (end_index > begin_index + ra_size)
 		return 0;
 
-	begin_index = end_index - ra_size;
+	begin_index = end_index > ra_size ? end_index - ra_size : 0;
 
 	ra_set_class(ra, RA_CLASS_BACKWARD);
 	ra_set_index(ra, begin_index, begin_index);
@@ -1864,7 +1865,7 @@ page_cache_readahead_adaptive(struct add
 	 * Backward read-ahead.
 	 */
 	if (!page && begin_index == index &&
-				try_read_backward(ra, index, size, ra_max))
+				try_read_backward(ra, index, size, ra_min, ra_max))
 		return ra_dispatch(ra, mapping, filp);
 
 	/*

--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/6] readahead: kconfig option READAHEAD_ALLOW_OVERHEADS
@ 2006-06-25 13:07   ` Wu Fengguang
  0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2006-06-25 13:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Wu Fengguang

[-- Attachment #1: readahead-kconfig-allow-overheads.patch --]
[-- Type: text/plain, Size: 843 bytes --]

Introduce a kconfig option READAHEAD_ALLOW_OVERHEADS to enable users
to choose extra features that have overheads.

Features with overheads will be disabled by default.

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---

--- linux-2.6.17-mm2.orig/mm/Kconfig
+++ linux-2.6.17-mm2/mm/Kconfig
@@ -182,10 +182,15 @@ config ADAPTIVE_READAHEAD
 	  It is known to work well for many desktops, file servers and
 	  postgresql databases. Say Y to try it out for yourself.
 
+config READAHEAD_ALLOW_OVERHEADS
+	bool "Allow extra features with overheads"
+	default n
+	depends on ADAPTIVE_READAHEAD
+
 config DEBUG_READAHEAD
 	bool "Readahead debug and accounting"
 	default y
-	depends on ADAPTIVE_READAHEAD
+	depends on READAHEAD_ALLOW_OVERHEADS
 	select DEBUG_FS
 	help
 	  This option injects extra code to dump detailed debug traces and do

--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 5/6] readahead: kconfig option READAHEAD_HIT_FEEDBACK
@ 2006-06-25 13:07   ` Wu Fengguang
  0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2006-06-25 13:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Wu Fengguang

[-- Attachment #1: readahead-kconfig-rahit-feedback.patch --]
[-- Type: text/plain, Size: 5585 bytes --]

Introduce a kconfig option READAHEAD_HIT_FEEDBACK to enable users
to disable the readahead hit feedback feature.

The readahead hit accounting brings per-page overheads.  However it is
necessary for the onseek method, and possible strides method in future.

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---


--- linux-2.6.17-mm2.orig/mm/Kconfig
+++ linux-2.6.17-mm2/mm/Kconfig
@@ -215,6 +215,16 @@ config DEBUG_READAHEAD
 
 	  Say N for production servers.
 
+config READAHEAD_HIT_FEEDBACK
+	bool "Readahead hit feedback"
+	default y
+	depends on READAHEAD_ALLOW_OVERHEADS
+	help
+	  Enable readahead hit feedback.
+
+	  It is not needed in normal cases, except for detecting the
+	  seek-and-read pattern.
+
 config READAHEAD_SMOOTH_AGING
 	bool "Fine grained readahead aging"
 	default n
--- linux-2.6.17-mm2.orig/include/linux/fs.h
+++ linux-2.6.17-mm2/include/linux/fs.h
@@ -648,6 +648,13 @@ struct file_ra_state {
 			pgoff_t readahead_index;
 
 			/*
+			 * Snapshot of the (node's) read-ahead aging value
+			 * on time of I/O submission.
+			 */
+			unsigned long age;
+
+#ifdef CONFIG_READAHEAD_HIT_FEEDBACK
+			/*
 			 * Read-ahead hits.
 			 * 	i.e. # of distinct read-ahead pages accessed.
 			 *
@@ -660,12 +667,7 @@ struct file_ra_state {
 			u16	hit1;	/* for the current sequence */
 			u16	hit2;	/* for the previous sequence */
 			u16	hit3;	/* for the prev-prev sequence */
-
-			/*
-			 * Snapshot of the (node's) read-ahead aging value
-			 * on time of I/O submission.
-			 */
-			unsigned long age;
+#endif
 		};
 #endif
 	};
--- linux-2.6.17-mm2.orig/mm/readahead.c
+++ linux-2.6.17-mm2/mm/readahead.c
@@ -933,8 +933,12 @@ static unsigned long ra_invoke_interval(
  */
 static int ra_cache_hit_ok(struct file_ra_state *ra)
 {
+#ifdef CONFIG_READAHEAD_HIT_FEEDBACK
 	return ra->hit0 * readahead_hit_rate >=
 					(ra->lookahead_index - ra->la_index);
+#else
+	return 1;
+#endif
 }
 
 /*
@@ -968,6 +972,7 @@ static void ra_set_class(struct file_ra_
 
 	ra->flags = flags | old_ra_class | ra_class;
 
+#ifdef CONFIG_READAHEAD_HIT_FEEDBACK
 	/*
 	 * Add request-hit up to sequence-hit and reset the former.
 	 */
@@ -984,6 +989,7 @@ static void ra_set_class(struct file_ra_
 		ra->hit2 = ra->hit1;
 		ra->hit1 = 0;
 	}
+#endif
 }
 
 /*
@@ -1684,6 +1690,7 @@ static int
 try_readahead_on_seek(struct file_ra_state *ra, pgoff_t index,
 			unsigned long ra_size, unsigned long ra_max)
 {
+#ifdef CONFIG_READAHEAD_HIT_FEEDBACK
 	unsigned long hit0 = ra->hit0;
 	unsigned long hit1 = ra->hit1 + hit0;
 	unsigned long hit2 = ra->hit2;
@@ -1712,6 +1719,9 @@ try_readahead_on_seek(struct file_ra_sta
 	ra_set_size(ra, ra_size, 0);
 
 	return 1;
+#else
+	return 0;
+#endif
 }
 
 /*
@@ -1739,7 +1749,7 @@ thrashing_recovery_readahead(struct addr
 		ra_size = ra->ra_index - index;
 	else {
 		/* After thrashing, we know the exact thrashing-threshold. */
-		ra_size = ra->hit0;
+		ra_size = index - ra->ra_index;
 		update_ra_thrash_bytes(mapping->backing_dev_info, ra_size);
 
 		/* And we'd better be a bit conservative. */
@@ -1908,6 +1918,7 @@ readit:
 	return size;
 }
 
+#if defined(CONFIG_READAHEAD_HIT_FEEDBACK) || defined(CONFIG_DEBUG_READAHEAD)
 /**
  * readahead_cache_hit - adaptive read-ahead feedback function
  * @ra: file_ra_state which holds the readahead state
@@ -1927,13 +1938,16 @@ void readahead_cache_hit(struct file_ra_
 	if (!ra_has_index(ra, page->index))
 		return;
 
+#ifdef CONFIG_READAHEAD_HIT_FEEDBACK
 	ra->hit0++;
+#endif
 
 	if (page->index >= ra->ra_index)
 		ra_account(ra, RA_EVENT_READAHEAD_HIT, 1);
 	else
 		ra_account(ra, RA_EVENT_READAHEAD_HIT, -1);
 }
+#endif
 
 /*
  * When closing a normal readonly file,
@@ -1949,7 +1963,6 @@ void readahead_close(struct file *file)
 	struct address_space *mapping = inode->i_mapping;
 	struct backing_dev_info *bdi = mapping->backing_dev_info;
 	unsigned long pos = file->f_pos;	/* supposed to be small */
-	unsigned long pgrahit = file->f_ra.hit0;
 	unsigned long pgcached = mapping->nrpages;
 	unsigned long pgaccess;
 
@@ -1959,7 +1972,12 @@ void readahead_close(struct file *file)
 	if (pgcached > bdi->ra_pages0)		/* excessive reads */
 		return;
 
-	pgaccess = max(pgrahit, 1 + pos / PAGE_CACHE_SIZE);
+	pgaccess = 1 + pos / PAGE_CACHE_SIZE;
+#ifdef CONFIG_READAHEAD_HIT_FEEDBACK
+	if (pgaccess < file->f_ra.hit0)
+		pgaccess = file->f_ra.hit0;
+#endif
+
 	if (pgaccess >= pgcached) {
 		if (bdi->ra_expect_bytes < bdi->ra_pages0 * PAGE_CACHE_SIZE)
 			bdi->ra_expect_bytes += pgcached * PAGE_CACHE_SIZE / 8;
@@ -1980,12 +1998,11 @@ void readahead_close(struct file *file)
 
 		debug_inc(initial_ra_miss);
 		dprintk("initial_ra_miss on file %s "
-				"size %lluK cached %luK hit %luK "
+				"size %lluK cached %luK "
 				"pos %lu by %s(%d)\n",
 				file->f_dentry->d_name.name,
 				i_size_read(inode) / 1024,
 				pgcached << (PAGE_CACHE_SHIFT - 10),
-				pgrahit << (PAGE_CACHE_SHIFT - 10),
 				pos,
 				current->comm, current->pid);
 	}
--- linux-2.6.17-mm2.orig/include/linux/mm.h
+++ linux-2.6.17-mm2/include/linux/mm.h
@@ -998,7 +998,16 @@ page_cache_readahead_adaptive(struct add
 			struct file_ra_state *ra, struct file *filp,
 			struct page *prev_page, struct page *page,
 			pgoff_t first_index, pgoff_t index, pgoff_t last_index);
+
+#if defined(CONFIG_READAHEAD_HIT_FEEDBACK) || defined(CONFIG_DEBUG_READAHEAD)
 void readahead_cache_hit(struct file_ra_state *ra, struct page *page);
+#else
+static inline void readahead_cache_hit(struct file_ra_state *ra,
+					struct page *page)
+{
+}
+#endif
+
 
 #ifdef CONFIG_ADAPTIVE_READAHEAD
 extern int readahead_ratio;

--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 6/6] readahead: remove the size limit of max_sectors_kb on read_ahead_kb
@ 2006-06-25 13:07   ` Wu Fengguang
  0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2006-06-25 13:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Wu Fengguang

[-- Attachment #1: readahead-size-limited-by-max-sectors-fix.patch --]
[-- Type: text/plain, Size: 1074 bytes --]

Remove the corelation between max_sectors_kb and read_ahead_kb.
It's unnecessary to reduce readahead size when setting max_sectors_kb.

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---


--- linux-2.6.17-mm2.orig/block/ll_rw_blk.c
+++ linux-2.6.17-mm2/block/ll_rw_blk.c
@@ -3851,25 +3851,11 @@ queue_max_sectors_store(struct request_q
 			max_hw_sectors_kb = q->max_hw_sectors >> 1,
 			page_kb = 1 << (PAGE_CACHE_SHIFT - 10);
 	ssize_t ret = queue_var_store(&max_sectors_kb, page, count);
-	int ra_kb;
 
 	if (max_sectors_kb > max_hw_sectors_kb || max_sectors_kb < page_kb)
 		return -EINVAL;
-	/*
-	 * Take the queue lock to update the readahead and max_sectors
-	 * values synchronously:
-	 */
-	spin_lock_irq(q->queue_lock);
-	/*
-	 * Trim readahead window as well, if necessary:
-	 */
-	ra_kb = q->backing_dev_info.ra_pages << (PAGE_CACHE_SHIFT - 10);
-	if (ra_kb > max_sectors_kb)
-		q->backing_dev_info.ra_pages =
-				max_sectors_kb >> (PAGE_CACHE_SHIFT - 10);
 
 	q->max_sectors = max_sectors_kb << 1;
-	spin_unlock_irq(q->queue_lock);
 
 	return ret;
 }

--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/6] Adaptive readahead updates 3
@ 2006-06-26  1:52   ` Wu Fengguang
  0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2006-06-26  1:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew,

On Sun, Jun 25, 2006 at 09:07:04PM +0800, Wu Fengguang wrote:
> Most patches here are focused on seperating out readahead overheads.

Here are some notes that I forgot to include in the mail:

The three patches:
        [PATCH 3/6] readahead: kconfig option READAHEAD_ALLOW_OVERHEADS
        [PATCH 4/6] readahead: kconfig option READAHEAD_SMOOTH_AGING
        [PATCH 5/6] readahead: kconfig option READAHEAD_HIT_FEEDBACK

make the menuconfig look like this:
        [*] Adaptive file readahead (EXPERIMENTAL)
        [*]   Allow extra features with overheads
        [*]     Readahead debug and accounting (NEW)
        [*]     Readahead hit feedback (NEW)
        [ ]     Fine grained readahead aging (NEW)

With all of the three new options disabled(the defaults), it becomes comparable
to the stock readahead in efficiency when reading big files:

Summary:
                      user       sys       cpu         total
ARA     avg           0.13       5.42      91.62%      6.02 
STOCK   avg           0.13       5.47      91.64%      6.09

Details:

ARA
cp work/sparse /dev/null  0.13s user 5.44s system 92% cpu 5.987 total
cp work/sparse /dev/null  0.11s user 5.42s system 91% cpu 6.028 total
cp work/sparse /dev/null  0.14s user 5.47s system 92% cpu 6.087 total
cp work/sparse /dev/null  0.13s user 5.46s system 91% cpu 6.087 total
cp work/sparse /dev/null  0.13s user 5.45s system 91% cpu 6.070 total
cp work/sparse /dev/null  0.13s user 5.41s system 92% cpu 6.003 total
cp work/sparse /dev/null  0.13s user 5.42s system 91% cpu 6.036 total
cp work/sparse /dev/null  0.13s user 5.36s system 91% cpu 6.003 total
cp work/sparse /dev/null  0.14s user 5.39s system 92% cpu 6.003 total
cp work/sparse /dev/null  0.14s user 5.42s system 92% cpu 6.028 total
cp work/sparse /dev/null  0.12s user 5.36s system 92% cpu 5.937 total
cp work/sparse /dev/null  0.12s user 5.36s system 92% cpu 5.961 total
cp work/sparse /dev/null  0.13s user 5.47s system 92% cpu 6.062 total

STOCK
cp work/sparse /dev/null  0.13s user 5.49s system 92% cpu 6.068 total
cp work/sparse /dev/null  0.15s user 5.38s system 92% cpu 6.012 total
cp work/sparse /dev/null  0.12s user 5.49s system 91% cpu 6.112 total
cp work/sparse /dev/null  0.12s user 5.52s system 92% cpu 6.103 total
cp work/sparse /dev/null  0.13s user 5.57s system 91% cpu 6.203 total
cp work/sparse /dev/null  0.11s user 5.45s system 92% cpu 6.037 total
cp work/sparse /dev/null  0.13s user 5.52s system 92% cpu 6.120 total
cp work/sparse /dev/null  0.14s user 5.43s system 91% cpu 6.070 total
cp work/sparse /dev/null  0.12s user 5.49s system 92% cpu 6.078 total
cp work/sparse /dev/null  0.13s user 5.51s system 92% cpu 6.128 total
cp work/sparse /dev/null  0.13s user 5.45s system 92% cpu 6.061 total
cp work/sparse /dev/null  0.15s user 5.40s system 91% cpu 6.037 total
cp work/sparse /dev/null  0.14s user 5.49s system 92% cpu 6.086 total
cp work/sparse /dev/null  0.13s user 5.46s system 91% cpu 6.106 total

Thanks,
Wu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [updated PATCH 1/6] readahead: context based method - slow start
@ 2006-06-26  2:35     ` Wu Fengguang
  0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2006-06-26  2:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

The context method will lead to noticable overhead(readahead miss)
on very sparse random reads.

Having the readahead window to start slowly makes it much better.
But still startup quick if the user prefers sparse readahead.

Benchmarks of reading randomly 100,000 pages on a 1000,000 pages _sparse_ file:

        ARA before patch          ARA                STOCK
        ================    ================    ================
real    2.779s    2.782s    2.552s    2.606s    2.477s    2.521s
user    1.120s    1.184s    1.133s    1.155s    1.097s    1.159s
sys     1.248s    1.208s    1.093s    1.086s    1.079s    1.064s

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---

--- linux-2.6.17-mm2.orig/mm/readahead.c
+++ linux-2.6.17-mm2/mm/readahead.c
@@ -1548,6 +1548,12 @@ try_context_based_readahead(struct addre
 			return -1;
 	} else if (prev_page || probe_page(mapping, index - 1)) {
 		ra_index = index;
+		/*
+		 * Slow start of readahead window.
+		 * It helps avoid most readahead miss on sparse random reads.
+		 */
+		if (readahead_hit_rate == 1)
+			ra_min = 1;
 	} else if (readahead_hit_rate > 1) {
 		ra_index = find_segtail_backward(mapping, index,
 						readahead_hit_rate + ra_min);

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-06-26  2:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-25 13:07 [PATCH 0/6] Adaptive readahead updates 3 Wu Fengguang
2006-06-25 13:07 ` Wu Fengguang
2006-06-25 13:07 ` [PATCH 1/6] readahead: context based method - slow start Wu Fengguang
2006-06-25 13:07   ` Wu Fengguang
2006-06-26  2:35   ` [updated PATCH " Wu Fengguang
2006-06-26  2:35     ` Wu Fengguang
2006-06-25 13:07 ` [PATCH 2/6] readahead: backward prefetching method fix Wu Fengguang
2006-06-25 13:07   ` Wu Fengguang
2006-06-25 13:07 ` [PATCH 3/6] readahead: kconfig option READAHEAD_ALLOW_OVERHEADS Wu Fengguang
2006-06-25 13:07   ` Wu Fengguang
2006-06-25 13:07 ` [PATCH 5/6] readahead: kconfig option READAHEAD_HIT_FEEDBACK Wu Fengguang
2006-06-25 13:07   ` Wu Fengguang
2006-06-25 13:07 ` [PATCH 6/6] readahead: remove the size limit of max_sectors_kb on read_ahead_kb Wu Fengguang
2006-06-25 13:07   ` Wu Fengguang
2006-06-26  1:52 ` [PATCH 0/6] Adaptive readahead updates 3 Wu Fengguang
2006-06-26  1:52   ` Wu Fengguang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.