All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 -mm] make swapin readahead skip over holes
@ 2012-01-11 19:30 ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-01-11 19:30 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, KOSAKI Motohiro, akpm, mel, hannes, minchan.kim

Ever since abandoning the virtual scan of processes, for scalability
reasons, swap space has been a little more fragmented than before.
This can lead to the situation where a large memory user is killed,
swap space ends up full of "holes" and swapin readahead is totally
ineffective.

On my home system, after killing a leaky firefox it took over an
hour to page just under 2GB of memory back in, slowing the virtual
machines down to a crawl.

This patch makes swapin readahead simply skip over holes, instead
of stopping at them.  This allows the system to swap things back in
at rates of several MB/second, instead of a few hundred kB/second.

The checks done in valid_swaphandles are already done in 
read_swap_cache_async as well, allowing us to remove a fair amount
of code.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
-v2: cleanups suggested by Mel Gorman
     skip swap offset zero

 include/linux/swap.h |    2 +-
 mm/swap_state.c      |   17 ++++----------
 mm/swapfile.c        |   59 +++++++++++++------------------------------------
 3 files changed, 22 insertions(+), 56 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1e22e12..6e1282e 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -328,7 +328,7 @@ extern long total_swap_pages;
 extern void si_swapinfo(struct sysinfo *);
 extern swp_entry_t get_swap_page(void);
 extern swp_entry_t get_swap_page_of_type(int);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
+extern void get_swap_cluster(swp_entry_t, unsigned long *, unsigned long *);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index ea6b32d..3ec4886 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -372,25 +372,18 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr)
 {
-	int nr_pages;
 	struct page *page;
-	unsigned long offset;
+	unsigned long offset = swp_offset(entry);
 	unsigned long end_offset;
 
-	/*
-	 * Get starting offset for readaround, and number of pages to read.
-	 * Adjust starting address by readbehind (for NUMA interleave case)?
-	 * No, it's very unlikely that swap layout would follow vma layout,
-	 * more likely that neighbouring swap pages came from the same node:
-	 * so use the same "addr" to choose the same node for each swap read.
-	 */
-	nr_pages = valid_swaphandles(entry, &offset);
-	for (end_offset = offset + nr_pages; offset < end_offset; offset++) {
+	get_swap_cluster(entry, &offset, &end_offset);
+
+	for (; offset <= end_offset ; offset++) {
 		/* Ok, do the async read-ahead now */
 		page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
 						gfp_mask, vma, addr);
 		if (!page)
-			break;
+			continue;
 		page_cache_release(page);
 	}
 	lru_add_drain();	/* Push any new pages onto the LRU now */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index b1cd120..f09182e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2289,55 +2289,28 @@ int swapcache_prepare(swp_entry_t entry)
 }
 
 /*
- * swap_lock prevents swap_map being freed. Don't grab an extra
- * reference on the swaphandle, it doesn't matter if it becomes unused.
+ * Return a swap cluster sized and aligned block around offset.
  */
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
+void get_swap_cluster(swp_entry_t entry, unsigned long *begin,
+			unsigned long *end)
 {
 	struct swap_info_struct *si;
-	int our_page_cluster = page_cluster;
-	pgoff_t target, toff;
-	pgoff_t base, end;
-	int nr_pages = 0;
-
-	if (!our_page_cluster)	/* no readahead */
-		return 0;
-
-	si = swap_info[swp_type(entry)];
-	target = swp_offset(entry);
-	base = (target >> our_page_cluster) << our_page_cluster;
-	end = base + (1 << our_page_cluster);
-	if (!base)		/* first page is swap header */
-		base++;
+	unsigned long offset = swp_offset(entry);
+	unsigned long mask = (1 << page_cluster) - 1;
 
 	spin_lock(&swap_lock);
-	if (end > si->max)	/* don't go beyond end of map */
-		end = si->max;
-
-	/* Count contiguous allocated slots above our target */
-	for (toff = target; ++toff < end; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
-			break;
-	}
-	/* Count contiguous allocated slots below our target */
-	for (toff = target; --toff >= base; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
-			break;
-	}
+	si = swap_info[swp_type(entry)];
+	/* Round the begin down to a page_cluster boundary. */
+	offset &= ~mask;
+	if (!offset)	/* First page is swap header. */
+		offset++;
+	*begin = offset;
+	/* Round the end up, but not beyond the end of the swap device. */
+	offset |= mask;
+	if (offset > si->max)
+		offset = si->max;
+	*end = offset;
 	spin_unlock(&swap_lock);
-
-	/*
-	 * Indicate starting offset, and return number of pages to get:
-	 * if only 1, say 0, since there's then no readahead to be done.
-	 */
-	*offset = ++toff;
-	return nr_pages? ++nr_pages: 0;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 -mm] make swapin readahead skip over holes
@ 2012-01-11 19:30 ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-01-11 19:30 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, KOSAKI Motohiro, akpm, mel, hannes, minchan.kim

Ever since abandoning the virtual scan of processes, for scalability
reasons, swap space has been a little more fragmented than before.
This can lead to the situation where a large memory user is killed,
swap space ends up full of "holes" and swapin readahead is totally
ineffective.

On my home system, after killing a leaky firefox it took over an
hour to page just under 2GB of memory back in, slowing the virtual
machines down to a crawl.

This patch makes swapin readahead simply skip over holes, instead
of stopping at them.  This allows the system to swap things back in
at rates of several MB/second, instead of a few hundred kB/second.

The checks done in valid_swaphandles are already done in 
read_swap_cache_async as well, allowing us to remove a fair amount
of code.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
-v2: cleanups suggested by Mel Gorman
     skip swap offset zero

 include/linux/swap.h |    2 +-
 mm/swap_state.c      |   17 ++++----------
 mm/swapfile.c        |   59 +++++++++++++------------------------------------
 3 files changed, 22 insertions(+), 56 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1e22e12..6e1282e 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -328,7 +328,7 @@ extern long total_swap_pages;
 extern void si_swapinfo(struct sysinfo *);
 extern swp_entry_t get_swap_page(void);
 extern swp_entry_t get_swap_page_of_type(int);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
+extern void get_swap_cluster(swp_entry_t, unsigned long *, unsigned long *);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index ea6b32d..3ec4886 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -372,25 +372,18 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr)
 {
-	int nr_pages;
 	struct page *page;
-	unsigned long offset;
+	unsigned long offset = swp_offset(entry);
 	unsigned long end_offset;
 
-	/*
-	 * Get starting offset for readaround, and number of pages to read.
-	 * Adjust starting address by readbehind (for NUMA interleave case)?
-	 * No, it's very unlikely that swap layout would follow vma layout,
-	 * more likely that neighbouring swap pages came from the same node:
-	 * so use the same "addr" to choose the same node for each swap read.
-	 */
-	nr_pages = valid_swaphandles(entry, &offset);
-	for (end_offset = offset + nr_pages; offset < end_offset; offset++) {
+	get_swap_cluster(entry, &offset, &end_offset);
+
+	for (; offset <= end_offset ; offset++) {
 		/* Ok, do the async read-ahead now */
 		page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
 						gfp_mask, vma, addr);
 		if (!page)
-			break;
+			continue;
 		page_cache_release(page);
 	}
 	lru_add_drain();	/* Push any new pages onto the LRU now */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index b1cd120..f09182e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2289,55 +2289,28 @@ int swapcache_prepare(swp_entry_t entry)
 }
 
 /*
- * swap_lock prevents swap_map being freed. Don't grab an extra
- * reference on the swaphandle, it doesn't matter if it becomes unused.
+ * Return a swap cluster sized and aligned block around offset.
  */
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
+void get_swap_cluster(swp_entry_t entry, unsigned long *begin,
+			unsigned long *end)
 {
 	struct swap_info_struct *si;
-	int our_page_cluster = page_cluster;
-	pgoff_t target, toff;
-	pgoff_t base, end;
-	int nr_pages = 0;
-
-	if (!our_page_cluster)	/* no readahead */
-		return 0;
-
-	si = swap_info[swp_type(entry)];
-	target = swp_offset(entry);
-	base = (target >> our_page_cluster) << our_page_cluster;
-	end = base + (1 << our_page_cluster);
-	if (!base)		/* first page is swap header */
-		base++;
+	unsigned long offset = swp_offset(entry);
+	unsigned long mask = (1 << page_cluster) - 1;
 
 	spin_lock(&swap_lock);
-	if (end > si->max)	/* don't go beyond end of map */
-		end = si->max;
-
-	/* Count contiguous allocated slots above our target */
-	for (toff = target; ++toff < end; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
-			break;
-	}
-	/* Count contiguous allocated slots below our target */
-	for (toff = target; --toff >= base; nr_pages++) {
-		/* Don't read in free or bad pages */
-		if (!si->swap_map[toff])
-			break;
-		if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
-			break;
-	}
+	si = swap_info[swp_type(entry)];
+	/* Round the begin down to a page_cluster boundary. */
+	offset &= ~mask;
+	if (!offset)	/* First page is swap header. */
+		offset++;
+	*begin = offset;
+	/* Round the end up, but not beyond the end of the swap device. */
+	offset |= mask;
+	if (offset > si->max)
+		offset = si->max;
+	*end = offset;
 	spin_unlock(&swap_lock);
-
-	/*
-	 * Indicate starting offset, and return number of pages to get:
-	 * if only 1, say 0, since there's then no readahead to be done.
-	 */
-	*offset = ++toff;
-	return nr_pages? ++nr_pages: 0;
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
  2012-01-11 19:30 ` Rik van Riel
@ 2012-01-11 21:10   ` Johannes Weiner
  -1 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2012-01-11 21:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm, mel, minchan.kim

On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
> Ever since abandoning the virtual scan of processes, for scalability
> reasons, swap space has been a little more fragmented than before.
> This can lead to the situation where a large memory user is killed,
> swap space ends up full of "holes" and swapin readahead is totally
> ineffective.
> 
> On my home system, after killing a leaky firefox it took over an
> hour to page just under 2GB of memory back in, slowing the virtual
> machines down to a crawl.
> 
> This patch makes swapin readahead simply skip over holes, instead
> of stopping at them.  This allows the system to swap things back in
> at rates of several MB/second, instead of a few hundred kB/second.
> 
> The checks done in valid_swaphandles are already done in 
> read_swap_cache_async as well, allowing us to remove a fair amount
> of code.

__swap_duplicate() also checks for whether the offset is within the
swap device range.  Do you think we could remove get_swap_cluster()
altogether and just try reading the aligned page_cluster range?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
@ 2012-01-11 21:10   ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2012-01-11 21:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm, mel, minchan.kim

On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
> Ever since abandoning the virtual scan of processes, for scalability
> reasons, swap space has been a little more fragmented than before.
> This can lead to the situation where a large memory user is killed,
> swap space ends up full of "holes" and swapin readahead is totally
> ineffective.
> 
> On my home system, after killing a leaky firefox it took over an
> hour to page just under 2GB of memory back in, slowing the virtual
> machines down to a crawl.
> 
> This patch makes swapin readahead simply skip over holes, instead
> of stopping at them.  This allows the system to swap things back in
> at rates of several MB/second, instead of a few hundred kB/second.
> 
> The checks done in valid_swaphandles are already done in 
> read_swap_cache_async as well, allowing us to remove a fair amount
> of code.

__swap_duplicate() also checks for whether the offset is within the
swap device range.  Do you think we could remove get_swap_cluster()
altogether and just try reading the aligned page_cluster range?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
  2012-01-11 21:10   ` Johannes Weiner
@ 2012-01-11 21:30     ` Rik van Riel
  -1 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-01-11 21:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm, mel, minchan.kim

On 01/11/2012 04:10 PM, Johannes Weiner wrote:
> On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
>> Ever since abandoning the virtual scan of processes, for scalability
>> reasons, swap space has been a little more fragmented than before.
>> This can lead to the situation where a large memory user is killed,
>> swap space ends up full of "holes" and swapin readahead is totally
>> ineffective.
>>
>> On my home system, after killing a leaky firefox it took over an
>> hour to page just under 2GB of memory back in, slowing the virtual
>> machines down to a crawl.
>>
>> This patch makes swapin readahead simply skip over holes, instead
>> of stopping at them.  This allows the system to swap things back in
>> at rates of several MB/second, instead of a few hundred kB/second.
>>
>> The checks done in valid_swaphandles are already done in
>> read_swap_cache_async as well, allowing us to remove a fair amount
>> of code.
>
> __swap_duplicate() also checks for whether the offset is within the
> swap device range.  Do you think we could remove get_swap_cluster()
> altogether and just try reading the aligned page_cluster range?

That is how I implemented it originally, but we need
to take the swap_lock so it is cleaner to implement
a helper function in swapfile.c :)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
@ 2012-01-11 21:30     ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-01-11 21:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm, mel, minchan.kim

On 01/11/2012 04:10 PM, Johannes Weiner wrote:
> On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
>> Ever since abandoning the virtual scan of processes, for scalability
>> reasons, swap space has been a little more fragmented than before.
>> This can lead to the situation where a large memory user is killed,
>> swap space ends up full of "holes" and swapin readahead is totally
>> ineffective.
>>
>> On my home system, after killing a leaky firefox it took over an
>> hour to page just under 2GB of memory back in, slowing the virtual
>> machines down to a crawl.
>>
>> This patch makes swapin readahead simply skip over holes, instead
>> of stopping at them.  This allows the system to swap things back in
>> at rates of several MB/second, instead of a few hundred kB/second.
>>
>> The checks done in valid_swaphandles are already done in
>> read_swap_cache_async as well, allowing us to remove a fair amount
>> of code.
>
> __swap_duplicate() also checks for whether the offset is within the
> swap device range.  Do you think we could remove get_swap_cluster()
> altogether and just try reading the aligned page_cluster range?

That is how I implemented it originally, but we need
to take the swap_lock so it is cleaner to implement
a helper function in swapfile.c :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
  2012-01-11 21:30     ` Rik van Riel
@ 2012-01-11 21:42       ` Johannes Weiner
  -1 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2012-01-11 21:42 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm, mel, minchan.kim

On Wed, Jan 11, 2012 at 04:30:12PM -0500, Rik van Riel wrote:
> On 01/11/2012 04:10 PM, Johannes Weiner wrote:
> >On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
> >>Ever since abandoning the virtual scan of processes, for scalability
> >>reasons, swap space has been a little more fragmented than before.
> >>This can lead to the situation where a large memory user is killed,
> >>swap space ends up full of "holes" and swapin readahead is totally
> >>ineffective.
> >>
> >>On my home system, after killing a leaky firefox it took over an
> >>hour to page just under 2GB of memory back in, slowing the virtual
> >>machines down to a crawl.
> >>
> >>This patch makes swapin readahead simply skip over holes, instead
> >>of stopping at them.  This allows the system to swap things back in
> >>at rates of several MB/second, instead of a few hundred kB/second.
> >>
> >>The checks done in valid_swaphandles are already done in
> >>read_swap_cache_async as well, allowing us to remove a fair amount
> >>of code.
> >
> >__swap_duplicate() also checks for whether the offset is within the
> >swap device range.  Do you think we could remove get_swap_cluster()
> >altogether and just try reading the aligned page_cluster range?
> 
> That is how I implemented it originally, but we need
> to take the swap_lock so it is cleaner to implement
> a helper function in swapfile.c :)

AFAICS, it's only needed to validate the offset against si->max, but
this too is done in __swap_duplicate().

What's otherwise left is just rounding down swp_offset(entry) and
adding 1 << page_cluster to it, that shouldn't need the swap_lock?

Am I missing something?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
@ 2012-01-11 21:42       ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2012-01-11 21:42 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm, mel, minchan.kim

On Wed, Jan 11, 2012 at 04:30:12PM -0500, Rik van Riel wrote:
> On 01/11/2012 04:10 PM, Johannes Weiner wrote:
> >On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
> >>Ever since abandoning the virtual scan of processes, for scalability
> >>reasons, swap space has been a little more fragmented than before.
> >>This can lead to the situation where a large memory user is killed,
> >>swap space ends up full of "holes" and swapin readahead is totally
> >>ineffective.
> >>
> >>On my home system, after killing a leaky firefox it took over an
> >>hour to page just under 2GB of memory back in, slowing the virtual
> >>machines down to a crawl.
> >>
> >>This patch makes swapin readahead simply skip over holes, instead
> >>of stopping at them.  This allows the system to swap things back in
> >>at rates of several MB/second, instead of a few hundred kB/second.
> >>
> >>The checks done in valid_swaphandles are already done in
> >>read_swap_cache_async as well, allowing us to remove a fair amount
> >>of code.
> >
> >__swap_duplicate() also checks for whether the offset is within the
> >swap device range.  Do you think we could remove get_swap_cluster()
> >altogether and just try reading the aligned page_cluster range?
> 
> That is how I implemented it originally, but we need
> to take the swap_lock so it is cleaner to implement
> a helper function in swapfile.c :)

AFAICS, it's only needed to validate the offset against si->max, but
this too is done in __swap_duplicate().

What's otherwise left is just rounding down swp_offset(entry) and
adding 1 << page_cluster to it, that shouldn't need the swap_lock?

Am I missing something?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
  2012-01-11 21:42       ` Johannes Weiner
@ 2012-01-11 21:58         ` Rik van Riel
  -1 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-01-11 21:58 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm, mel, minchan.kim

On 01/11/2012 04:42 PM, Johannes Weiner wrote:
> On Wed, Jan 11, 2012 at 04:30:12PM -0500, Rik van Riel wrote:
>> On 01/11/2012 04:10 PM, Johannes Weiner wrote:
>>> On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
>>>> Ever since abandoning the virtual scan of processes, for scalability
>>>> reasons, swap space has been a little more fragmented than before.
>>>> This can lead to the situation where a large memory user is killed,
>>>> swap space ends up full of "holes" and swapin readahead is totally
>>>> ineffective.
>>>>
>>>> On my home system, after killing a leaky firefox it took over an
>>>> hour to page just under 2GB of memory back in, slowing the virtual
>>>> machines down to a crawl.
>>>>
>>>> This patch makes swapin readahead simply skip over holes, instead
>>>> of stopping at them.  This allows the system to swap things back in
>>>> at rates of several MB/second, instead of a few hundred kB/second.
>>>>
>>>> The checks done in valid_swaphandles are already done in
>>>> read_swap_cache_async as well, allowing us to remove a fair amount
>>>> of code.
>>>
>>> __swap_duplicate() also checks for whether the offset is within the
>>> swap device range.  Do you think we could remove get_swap_cluster()
>>> altogether and just try reading the aligned page_cluster range?
>>
>> That is how I implemented it originally, but we need
>> to take the swap_lock so it is cleaner to implement
>> a helper function in swapfile.c :)
>
> AFAICS, it's only needed to validate the offset against si->max, but
> this too is done in __swap_duplicate().
>
> What's otherwise left is just rounding down swp_offset(entry) and
> adding 1<<  page_cluster to it, that shouldn't need the swap_lock?
>
> Am I missing something?

Good point.  I guess we could get rid of get_swap_cluster
afterall and let __swap_duplicate deal with everything.

Andrew, Mel, is that ok with you?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
@ 2012-01-11 21:58         ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-01-11 21:58 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm, mel, minchan.kim

On 01/11/2012 04:42 PM, Johannes Weiner wrote:
> On Wed, Jan 11, 2012 at 04:30:12PM -0500, Rik van Riel wrote:
>> On 01/11/2012 04:10 PM, Johannes Weiner wrote:
>>> On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
>>>> Ever since abandoning the virtual scan of processes, for scalability
>>>> reasons, swap space has been a little more fragmented than before.
>>>> This can lead to the situation where a large memory user is killed,
>>>> swap space ends up full of "holes" and swapin readahead is totally
>>>> ineffective.
>>>>
>>>> On my home system, after killing a leaky firefox it took over an
>>>> hour to page just under 2GB of memory back in, slowing the virtual
>>>> machines down to a crawl.
>>>>
>>>> This patch makes swapin readahead simply skip over holes, instead
>>>> of stopping at them.  This allows the system to swap things back in
>>>> at rates of several MB/second, instead of a few hundred kB/second.
>>>>
>>>> The checks done in valid_swaphandles are already done in
>>>> read_swap_cache_async as well, allowing us to remove a fair amount
>>>> of code.
>>>
>>> __swap_duplicate() also checks for whether the offset is within the
>>> swap device range.  Do you think we could remove get_swap_cluster()
>>> altogether and just try reading the aligned page_cluster range?
>>
>> That is how I implemented it originally, but we need
>> to take the swap_lock so it is cleaner to implement
>> a helper function in swapfile.c :)
>
> AFAICS, it's only needed to validate the offset against si->max, but
> this too is done in __swap_duplicate().
>
> What's otherwise left is just rounding down swp_offset(entry) and
> adding 1<<  page_cluster to it, that shouldn't need the swap_lock?
>
> Am I missing something?

Good point.  I guess we could get rid of get_swap_cluster
afterall and let __swap_duplicate deal with everything.

Andrew, Mel, is that ok with you?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
  2012-01-11 21:58         ` Rik van Riel
@ 2012-01-12 14:53           ` Mel Gorman
  -1 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2012-01-12 14:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Johannes Weiner, linux-mm, linux-kernel, KOSAKI Motohiro, akpm,
	mel, minchan.kim

On Wed, Jan 11, 2012 at 04:58:14PM -0500, Rik van Riel wrote:
> On 01/11/2012 04:42 PM, Johannes Weiner wrote:
> >On Wed, Jan 11, 2012 at 04:30:12PM -0500, Rik van Riel wrote:
> >>On 01/11/2012 04:10 PM, Johannes Weiner wrote:
> >>>On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
> >>>>Ever since abandoning the virtual scan of processes, for scalability
> >>>>reasons, swap space has been a little more fragmented than before.
> >>>>This can lead to the situation where a large memory user is killed,
> >>>>swap space ends up full of "holes" and swapin readahead is totally
> >>>>ineffective.
> >>>>
> >>>>On my home system, after killing a leaky firefox it took over an
> >>>>hour to page just under 2GB of memory back in, slowing the virtual
> >>>>machines down to a crawl.
> >>>>
> >>>>This patch makes swapin readahead simply skip over holes, instead
> >>>>of stopping at them.  This allows the system to swap things back in
> >>>>at rates of several MB/second, instead of a few hundred kB/second.
> >>>>
> >>>>The checks done in valid_swaphandles are already done in
> >>>>read_swap_cache_async as well, allowing us to remove a fair amount
> >>>>of code.
> >>>
> >>>__swap_duplicate() also checks for whether the offset is within the
> >>>swap device range.  Do you think we could remove get_swap_cluster()
> >>>altogether and just try reading the aligned page_cluster range?
> >>
> >>That is how I implemented it originally, but we need
> >>to take the swap_lock so it is cleaner to implement
> >>a helper function in swapfile.c :)
> >
> >AFAICS, it's only needed to validate the offset against si->max, but
> >this too is done in __swap_duplicate().
> >
> >What's otherwise left is just rounding down swp_offset(entry) and
> >adding 1<<  page_cluster to it, that shouldn't need the swap_lock?
> >
> >Am I missing something?
> 
> Good point.  I guess we could get rid of get_swap_cluster
> afterall and let __swap_duplicate deal with everything.
> 
> Andrew, Mel, is that ok with you?
> 

I still think it is a bit unfortunate that we can end up allocating
more pages than we need and the radix_preload() before __swap_duplicate
identifies that the work was unnecessary. However, duplicate logic is
also bad and as you say, in the context of the cost of swap IO, the cost
is negligible and it's a corner case.

I'm ok with it.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -mm] make swapin readahead skip over holes
@ 2012-01-12 14:53           ` Mel Gorman
  0 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2012-01-12 14:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Johannes Weiner, linux-mm, linux-kernel, KOSAKI Motohiro, akpm,
	mel, minchan.kim

On Wed, Jan 11, 2012 at 04:58:14PM -0500, Rik van Riel wrote:
> On 01/11/2012 04:42 PM, Johannes Weiner wrote:
> >On Wed, Jan 11, 2012 at 04:30:12PM -0500, Rik van Riel wrote:
> >>On 01/11/2012 04:10 PM, Johannes Weiner wrote:
> >>>On Wed, Jan 11, 2012 at 02:30:44PM -0500, Rik van Riel wrote:
> >>>>Ever since abandoning the virtual scan of processes, for scalability
> >>>>reasons, swap space has been a little more fragmented than before.
> >>>>This can lead to the situation where a large memory user is killed,
> >>>>swap space ends up full of "holes" and swapin readahead is totally
> >>>>ineffective.
> >>>>
> >>>>On my home system, after killing a leaky firefox it took over an
> >>>>hour to page just under 2GB of memory back in, slowing the virtual
> >>>>machines down to a crawl.
> >>>>
> >>>>This patch makes swapin readahead simply skip over holes, instead
> >>>>of stopping at them.  This allows the system to swap things back in
> >>>>at rates of several MB/second, instead of a few hundred kB/second.
> >>>>
> >>>>The checks done in valid_swaphandles are already done in
> >>>>read_swap_cache_async as well, allowing us to remove a fair amount
> >>>>of code.
> >>>
> >>>__swap_duplicate() also checks for whether the offset is within the
> >>>swap device range.  Do you think we could remove get_swap_cluster()
> >>>altogether and just try reading the aligned page_cluster range?
> >>
> >>That is how I implemented it originally, but we need
> >>to take the swap_lock so it is cleaner to implement
> >>a helper function in swapfile.c :)
> >
> >AFAICS, it's only needed to validate the offset against si->max, but
> >this too is done in __swap_duplicate().
> >
> >What's otherwise left is just rounding down swp_offset(entry) and
> >adding 1<<  page_cluster to it, that shouldn't need the swap_lock?
> >
> >Am I missing something?
> 
> Good point.  I guess we could get rid of get_swap_cluster
> afterall and let __swap_duplicate deal with everything.
> 
> Andrew, Mel, is that ok with you?
> 

I still think it is a bit unfortunate that we can end up allocating
more pages than we need and the radix_preload() before __swap_duplicate
identifies that the work was unnecessary. However, duplicate logic is
also bad and as you say, in the context of the cost of swap IO, the cost
is negligible and it's a corner case.

I'm ok with it.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-01-12 14:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-11 19:30 [PATCH v2 -mm] make swapin readahead skip over holes Rik van Riel
2012-01-11 19:30 ` Rik van Riel
2012-01-11 21:10 ` Johannes Weiner
2012-01-11 21:10   ` Johannes Weiner
2012-01-11 21:30   ` Rik van Riel
2012-01-11 21:30     ` Rik van Riel
2012-01-11 21:42     ` Johannes Weiner
2012-01-11 21:42       ` Johannes Weiner
2012-01-11 21:58       ` Rik van Riel
2012-01-11 21:58         ` Rik van Riel
2012-01-12 14:53         ` Mel Gorman
2012-01-12 14:53           ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.