All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Obey mark_page_accessed hint given by filesystems v2
@ 2013-05-13 10:21 ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

This series is in need of Tested-by's and some reviewing before it can be
pushed anywhere. The performance of the tests I ran were not sensitive
to the premature reclaim of buffer pages although I was able to show
the average age of buffer pages is now higher as expected. Also note the
comments I make about the average age of file pages versus buffer pages
at the end of this mail that I'd like filesystem people to think about.

Changelog since V1
o Add tracepoint to model age of page types			(mel)

Andrew Perepechko reported a problem whereby pages are being prematurely
evicted as the mark_page_accessed() hint is ignored for pages that are
currently on a pagevec -- http://www.spinics.net/lists/linux-ext4/msg37340.html .
Alexey Lyahkov and Robin Dong have also reported problems recently that
could be due to hot pages reaching the end of the inactive list too quickly
and be reclaimed.

Rather than addressing this on a per-filesystem basis, this series aims
to fix the mark_page_accessed() interface by deferring what LRU a page
is added to pagevec drain time and allowing mark_page_accessed() to call
SetPageActive on a pagevec page.

Patch 1 adds two tracepoints for LRU page activation and insertion. Using
	these processes it's possible to build a model of pages in the
	LRU that can be processed offline.

Patch 2 defers making the decision on what LRU to add a page to until when
	the pagevec is drained.

Patch 3 searches the local pagevec for pages to mark PageActive on
	mark_page_accessed. The changelog explains why only the local
	pagevec is examined.

Patch 4 tidies up the API.

postmark, a dd-based test and fs-mark both single and threaded mode were
run but none of them showed any performance degradation or gain as a result
of the patch.

Using patch 1, I built a *very* basic model of the LRU to examine
offline what the average age of different page types on the LRU were in
milliseconds. Of course, capturing the trace distorts the test as it's
written to local disk but it does not matter for the purposes of this test.
The average age of pages in milliseconds were

				    vanilla deferdrain
Average age mapped anon:               1454       1855
Average age mapped file:             127841     143755
Average age unmapped anon:               85        157
Average age unmapped file:            73633      39368
Average age unmapped buffers:         74054     116636

The LRU activity was mostly files which you'd expect for a dd-based
workload. Note that the average age of buffer pages is increased by the
series and it is expected this is due to the fact that the buffer pages are
now getting added to the active list when drained from the pagevecs. Note
that the average age of the unmapped file data is decreased as they are
still added to the inactive list and are reclaimed before the buffers. There
is no guarantee this is a universal win for all workloads and it would be
nice if the filesystem people gave some thought as to whether this decision
is generally a win or a loss.

 fs/cachefiles/rdwr.c           | 30 ++++----------
 fs/nfs/dir.c                   |  7 +---
 include/linux/pagevec.h        | 34 +---------------
 include/trace/events/pagemap.h | 89 ++++++++++++++++++++++++++++++++++++++++++
 mm/swap.c                      | 82 +++++++++++++++++++++++++-------------
 5 files changed, 154 insertions(+), 88 deletions(-)
 create mode 100644 include/trace/events/pagemap.h

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Obey mark_page_accessed hint given by filesystems v2
@ 2013-05-13 10:21 ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

This series is in need of Tested-by's and some reviewing before it can be
pushed anywhere. The performance of the tests I ran were not sensitive
to the premature reclaim of buffer pages although I was able to show
the average age of buffer pages is now higher as expected. Also note the
comments I make about the average age of file pages versus buffer pages
at the end of this mail that I'd like filesystem people to think about.

Changelog since V1
o Add tracepoint to model age of page types			(mel)

Andrew Perepechko reported a problem whereby pages are being prematurely
evicted as the mark_page_accessed() hint is ignored for pages that are
currently on a pagevec -- http://www.spinics.net/lists/linux-ext4/msg37340.html .
Alexey Lyahkov and Robin Dong have also reported problems recently that
could be due to hot pages reaching the end of the inactive list too quickly
and be reclaimed.

Rather than addressing this on a per-filesystem basis, this series aims
to fix the mark_page_accessed() interface by deferring what LRU a page
is added to pagevec drain time and allowing mark_page_accessed() to call
SetPageActive on a pagevec page.

Patch 1 adds two tracepoints for LRU page activation and insertion. Using
	these processes it's possible to build a model of pages in the
	LRU that can be processed offline.

Patch 2 defers making the decision on what LRU to add a page to until when
	the pagevec is drained.

Patch 3 searches the local pagevec for pages to mark PageActive on
	mark_page_accessed. The changelog explains why only the local
	pagevec is examined.

Patch 4 tidies up the API.

postmark, a dd-based test and fs-mark both single and threaded mode were
run but none of them showed any performance degradation or gain as a result
of the patch.

Using patch 1, I built a *very* basic model of the LRU to examine
offline what the average age of different page types on the LRU were in
milliseconds. Of course, capturing the trace distorts the test as it's
written to local disk but it does not matter for the purposes of this test.
The average age of pages in milliseconds were

				    vanilla deferdrain
Average age mapped anon:               1454       1855
Average age mapped file:             127841     143755
Average age unmapped anon:               85        157
Average age unmapped file:            73633      39368
Average age unmapped buffers:         74054     116636

The LRU activity was mostly files which you'd expect for a dd-based
workload. Note that the average age of buffer pages is increased by the
series and it is expected this is due to the fact that the buffer pages are
now getting added to the active list when drained from the pagevecs. Note
that the average age of the unmapped file data is decreased as they are
still added to the inactive list and are reclaimed before the buffers. There
is no guarantee this is a universal win for all workloads and it would be
nice if the filesystem people gave some thought as to whether this decision
is generally a win or a loss.

 fs/cachefiles/rdwr.c           | 30 ++++----------
 fs/nfs/dir.c                   |  7 +---
 include/linux/pagevec.h        | 34 +---------------
 include/trace/events/pagemap.h | 89 ++++++++++++++++++++++++++++++++++++++++++
 mm/swap.c                      | 82 +++++++++++++++++++++++++-------------
 5 files changed, 154 insertions(+), 88 deletions(-)
 create mode 100644 include/trace/events/pagemap.h

-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/4] mm: Add tracepoints for LRU activation and insertions
  2013-05-13 10:21 ` Mel Gorman
@ 2013-05-13 10:21   ` Mel Gorman
  -1 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

Using these tracepoints it is possible to model LRU activity and the
average residency of pages of different types. This can be used to
debug problems related to premature reclaim of pages of particular
types.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/pagemap.h | 89 ++++++++++++++++++++++++++++++++++++++++++
 mm/swap.c                      |  5 +++
 2 files changed, 94 insertions(+)
 create mode 100644 include/trace/events/pagemap.h

diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
new file mode 100644
index 0000000..1c9fabd
--- /dev/null
+++ b/include/trace/events/pagemap.h
@@ -0,0 +1,89 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pagemap
+
+#if !defined(_TRACE_PAGEMAP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGEMAP_H
+
+#include <linux/tracepoint.h>
+#include <linux/mm.h>
+
+#define	PAGEMAP_MAPPED		0x0001u
+#define PAGEMAP_ANONYMOUS	0x0002u
+#define PAGEMAP_FILE		0x0004u
+#define PAGEMAP_SWAPCACHE	0x0008u
+#define PAGEMAP_SWAPBACKED	0x0010u
+#define PAGEMAP_MAPPEDDISK	0x0020u
+#define PAGEMAP_BUFFERS		0x0040u
+
+#define trace_pagemap_flags(page) ( \
+	(PageAnon(page)		? PAGEMAP_ANONYMOUS  : PAGEMAP_FILE) | \
+	(page_mapped(page)	? PAGEMAP_MAPPED     : 0) | \
+	(PageSwapCache(page)	? PAGEMAP_SWAPCACHE  : 0) | \
+	(PageSwapBacked(page)	? PAGEMAP_SWAPBACKED : 0) | \
+	(PageMappedToDisk(page)	? PAGEMAP_MAPPEDDISK : 0) | \
+	(page_has_private(page) ? PAGEMAP_BUFFERS    : 0) \
+	)
+
+TRACE_EVENT(mm_lru_insertion,
+
+	TP_PROTO(
+		struct page *page,
+		unsigned long pfn,
+		int lru,
+		unsigned long flags
+	),
+
+	TP_ARGS(page, pfn, lru, flags),
+
+	TP_STRUCT__entry(
+		__field(struct page *,	page	)
+		__field(unsigned long,	pfn	)
+		__field(int,		lru	)
+		__field(unsigned long,	flags	)
+	),
+
+	TP_fast_assign(
+		__entry->page	= page;
+		__entry->pfn	= pfn;
+		__entry->lru	= lru;
+		__entry->flags	= flags;
+	),
+
+	/* Flag format is based on page-types.c formatting for pagemap */
+	TP_printk("page=%p pfn=%lu lru=%d flags=%s%s%s%s%s%s",
+			__entry->page,
+			__entry->pfn,
+			__entry->lru,
+			__entry->flags & PAGEMAP_MAPPED		? "M" : " ",
+			__entry->flags & PAGEMAP_ANONYMOUS	? "a" : "f",
+			__entry->flags & PAGEMAP_SWAPCACHE	? "s" : " ",
+			__entry->flags & PAGEMAP_SWAPBACKED	? "b" : " ",
+			__entry->flags & PAGEMAP_MAPPEDDISK	? "d" : " ",
+			__entry->flags & PAGEMAP_BUFFERS	? "B" : " ")
+);
+
+TRACE_EVENT(mm_lru_activate,
+
+	TP_PROTO(struct page *page, unsigned long pfn),
+
+	TP_ARGS(page, pfn),
+
+	TP_STRUCT__entry(
+		__field(struct page *,	page	)
+		__field(unsigned long,	pfn	)
+	),
+
+	TP_fast_assign(
+		__entry->page	= page;
+		__entry->pfn	= pfn;
+	),
+
+	/* Flag format is based on page-types.c formatting for pagemap */
+	TP_printk("page=%p pfn=%lu", __entry->page, __entry->pfn)
+
+);
+
+#endif /* _TRACE_PAGEMAP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/swap.c b/mm/swap.c
index 8a529a0..c612a6a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -33,6 +33,9 @@
 
 #include "internal.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/pagemap.h>
+
 /* How many pages do we try to swap or page in/out together? */
 int page_cluster;
 
@@ -383,6 +386,7 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
 		SetPageActive(page);
 		lru += LRU_ACTIVE;
 		add_page_to_lru_list(page, lruvec, lru);
+		trace_mm_lru_activate(page, page_to_pfn(page));
 
 		__count_vm_event(PGACTIVATE);
 		update_page_reclaim_stat(lruvec, file, 1);
@@ -802,6 +806,7 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
 		SetPageActive(page);
 	add_page_to_lru_list(page, lruvec, lru);
 	update_page_reclaim_stat(lruvec, file, active);
+	trace_mm_lru_insertion(page, page_to_pfn(page), lru, trace_pagemap_flags(page));
 }
 
 /*
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/4] mm: Add tracepoints for LRU activation and insertions
@ 2013-05-13 10:21   ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

Using these tracepoints it is possible to model LRU activity and the
average residency of pages of different types. This can be used to
debug problems related to premature reclaim of pages of particular
types.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/pagemap.h | 89 ++++++++++++++++++++++++++++++++++++++++++
 mm/swap.c                      |  5 +++
 2 files changed, 94 insertions(+)
 create mode 100644 include/trace/events/pagemap.h

diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
new file mode 100644
index 0000000..1c9fabd
--- /dev/null
+++ b/include/trace/events/pagemap.h
@@ -0,0 +1,89 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pagemap
+
+#if !defined(_TRACE_PAGEMAP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGEMAP_H
+
+#include <linux/tracepoint.h>
+#include <linux/mm.h>
+
+#define	PAGEMAP_MAPPED		0x0001u
+#define PAGEMAP_ANONYMOUS	0x0002u
+#define PAGEMAP_FILE		0x0004u
+#define PAGEMAP_SWAPCACHE	0x0008u
+#define PAGEMAP_SWAPBACKED	0x0010u
+#define PAGEMAP_MAPPEDDISK	0x0020u
+#define PAGEMAP_BUFFERS		0x0040u
+
+#define trace_pagemap_flags(page) ( \
+	(PageAnon(page)		? PAGEMAP_ANONYMOUS  : PAGEMAP_FILE) | \
+	(page_mapped(page)	? PAGEMAP_MAPPED     : 0) | \
+	(PageSwapCache(page)	? PAGEMAP_SWAPCACHE  : 0) | \
+	(PageSwapBacked(page)	? PAGEMAP_SWAPBACKED : 0) | \
+	(PageMappedToDisk(page)	? PAGEMAP_MAPPEDDISK : 0) | \
+	(page_has_private(page) ? PAGEMAP_BUFFERS    : 0) \
+	)
+
+TRACE_EVENT(mm_lru_insertion,
+
+	TP_PROTO(
+		struct page *page,
+		unsigned long pfn,
+		int lru,
+		unsigned long flags
+	),
+
+	TP_ARGS(page, pfn, lru, flags),
+
+	TP_STRUCT__entry(
+		__field(struct page *,	page	)
+		__field(unsigned long,	pfn	)
+		__field(int,		lru	)
+		__field(unsigned long,	flags	)
+	),
+
+	TP_fast_assign(
+		__entry->page	= page;
+		__entry->pfn	= pfn;
+		__entry->lru	= lru;
+		__entry->flags	= flags;
+	),
+
+	/* Flag format is based on page-types.c formatting for pagemap */
+	TP_printk("page=%p pfn=%lu lru=%d flags=%s%s%s%s%s%s",
+			__entry->page,
+			__entry->pfn,
+			__entry->lru,
+			__entry->flags & PAGEMAP_MAPPED		? "M" : " ",
+			__entry->flags & PAGEMAP_ANONYMOUS	? "a" : "f",
+			__entry->flags & PAGEMAP_SWAPCACHE	? "s" : " ",
+			__entry->flags & PAGEMAP_SWAPBACKED	? "b" : " ",
+			__entry->flags & PAGEMAP_MAPPEDDISK	? "d" : " ",
+			__entry->flags & PAGEMAP_BUFFERS	? "B" : " ")
+);
+
+TRACE_EVENT(mm_lru_activate,
+
+	TP_PROTO(struct page *page, unsigned long pfn),
+
+	TP_ARGS(page, pfn),
+
+	TP_STRUCT__entry(
+		__field(struct page *,	page	)
+		__field(unsigned long,	pfn	)
+	),
+
+	TP_fast_assign(
+		__entry->page	= page;
+		__entry->pfn	= pfn;
+	),
+
+	/* Flag format is based on page-types.c formatting for pagemap */
+	TP_printk("page=%p pfn=%lu", __entry->page, __entry->pfn)
+
+);
+
+#endif /* _TRACE_PAGEMAP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/swap.c b/mm/swap.c
index 8a529a0..c612a6a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -33,6 +33,9 @@
 
 #include "internal.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/pagemap.h>
+
 /* How many pages do we try to swap or page in/out together? */
 int page_cluster;
 
@@ -383,6 +386,7 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
 		SetPageActive(page);
 		lru += LRU_ACTIVE;
 		add_page_to_lru_list(page, lruvec, lru);
+		trace_mm_lru_activate(page, page_to_pfn(page));
 
 		__count_vm_event(PGACTIVATE);
 		update_page_reclaim_stat(lruvec, file, 1);
@@ -802,6 +806,7 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
 		SetPageActive(page);
 	add_page_to_lru_list(page, lruvec, lru);
 	update_page_reclaim_stat(lruvec, file, active);
+	trace_mm_lru_insertion(page, page_to_pfn(page), lru, trace_pagemap_flags(page));
 }
 
 /*
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/4] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time
  2013-05-13 10:21 ` Mel Gorman
@ 2013-05-13 10:21   ` Mel Gorman
  -1 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

mark_page_accessed cannot activate an inactive page that is located on
an inactive LRU pagevec. Hints from filesystems may be ignored as a
result. In preparation for fixing that problem, this patch removes the
per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
added to is deferred until the pagevec is drained.

This means that fewer pagevecs are available and potentially there is
greater contention on the LRU lock. However, this only applies in the case
where there is an almost perfect mix of file, anon, active and inactive
pages being added to the LRU. In practice I expect that we are adding
stream of pages of a particular time and that the changes in contention
will barely be measurable.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
---
 mm/swap.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index c612a6a..0911579 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -39,7 +39,7 @@
 /* How many pages do we try to swap or page in/out together? */
 int page_cluster;
 
-static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
 
@@ -460,13 +460,18 @@ EXPORT_SYMBOL(mark_page_accessed);
  */
 void __lru_cache_add(struct page *page, enum lru_list lru)
 {
-	struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
+	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+
+	if (is_active_lru(lru))
+		SetPageActive(page);
+	else
+		ClearPageActive(page);
 
 	page_cache_get(page);
 	if (!pagevec_space(pvec))
 		__pagevec_lru_add(pvec, lru);
 	pagevec_add(pvec, page);
-	put_cpu_var(lru_add_pvecs);
+	put_cpu_var(lru_add_pvec);
 }
 EXPORT_SYMBOL(__lru_cache_add);
 
@@ -479,13 +484,11 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
 {
 	if (PageActive(page)) {
 		VM_BUG_ON(PageUnevictable(page));
-		ClearPageActive(page);
 	} else if (PageUnevictable(page)) {
 		VM_BUG_ON(PageActive(page));
-		ClearPageUnevictable(page);
 	}
 
-	VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page));
+	VM_BUG_ON(PageLRU(page));
 	__lru_cache_add(page, lru);
 }
 
@@ -586,15 +589,10 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
  */
 void lru_add_drain_cpu(int cpu)
 {
-	struct pagevec *pvecs = per_cpu(lru_add_pvecs, cpu);
-	struct pagevec *pvec;
-	int lru;
+	struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);
 
-	for_each_lru(lru) {
-		pvec = &pvecs[lru - LRU_BASE];
-		if (pagevec_count(pvec))
-			__pagevec_lru_add(pvec, lru);
-	}
+	if (pagevec_count(pvec))
+		__pagevec_lru_add(pvec, NR_LRU_LISTS);
 
 	pvec = &per_cpu(lru_rotate_pvecs, cpu);
 	if (pagevec_count(pvec)) {
@@ -793,17 +791,16 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
 				 void *arg)
 {
-	enum lru_list lru = (enum lru_list)arg;
-	int file = is_file_lru(lru);
-	int active = is_active_lru(lru);
+	enum lru_list requested_lru = (enum lru_list)arg;
+	int file = page_is_file_cache(page);
+	int active = PageActive(page);
+	enum lru_list lru = page_lru(page);
 
-	VM_BUG_ON(PageActive(page));
+	WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
 	VM_BUG_ON(PageUnevictable(page));
 	VM_BUG_ON(PageLRU(page));
 
 	SetPageLRU(page);
-	if (active)
-		SetPageActive(page);
 	add_page_to_lru_list(page, lruvec, lru);
 	update_page_reclaim_stat(lruvec, file, active);
 	trace_mm_lru_insertion(page, page_to_pfn(page), lru, trace_pagemap_flags(page));
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/4] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time
@ 2013-05-13 10:21   ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

mark_page_accessed cannot activate an inactive page that is located on
an inactive LRU pagevec. Hints from filesystems may be ignored as a
result. In preparation for fixing that problem, this patch removes the
per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
added to is deferred until the pagevec is drained.

This means that fewer pagevecs are available and potentially there is
greater contention on the LRU lock. However, this only applies in the case
where there is an almost perfect mix of file, anon, active and inactive
pages being added to the LRU. In practice I expect that we are adding
stream of pages of a particular time and that the changes in contention
will barely be measurable.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
---
 mm/swap.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index c612a6a..0911579 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -39,7 +39,7 @@
 /* How many pages do we try to swap or page in/out together? */
 int page_cluster;
 
-static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
 
@@ -460,13 +460,18 @@ EXPORT_SYMBOL(mark_page_accessed);
  */
 void __lru_cache_add(struct page *page, enum lru_list lru)
 {
-	struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
+	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+
+	if (is_active_lru(lru))
+		SetPageActive(page);
+	else
+		ClearPageActive(page);
 
 	page_cache_get(page);
 	if (!pagevec_space(pvec))
 		__pagevec_lru_add(pvec, lru);
 	pagevec_add(pvec, page);
-	put_cpu_var(lru_add_pvecs);
+	put_cpu_var(lru_add_pvec);
 }
 EXPORT_SYMBOL(__lru_cache_add);
 
@@ -479,13 +484,11 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
 {
 	if (PageActive(page)) {
 		VM_BUG_ON(PageUnevictable(page));
-		ClearPageActive(page);
 	} else if (PageUnevictable(page)) {
 		VM_BUG_ON(PageActive(page));
-		ClearPageUnevictable(page);
 	}
 
-	VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page));
+	VM_BUG_ON(PageLRU(page));
 	__lru_cache_add(page, lru);
 }
 
@@ -586,15 +589,10 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
  */
 void lru_add_drain_cpu(int cpu)
 {
-	struct pagevec *pvecs = per_cpu(lru_add_pvecs, cpu);
-	struct pagevec *pvec;
-	int lru;
+	struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);
 
-	for_each_lru(lru) {
-		pvec = &pvecs[lru - LRU_BASE];
-		if (pagevec_count(pvec))
-			__pagevec_lru_add(pvec, lru);
-	}
+	if (pagevec_count(pvec))
+		__pagevec_lru_add(pvec, NR_LRU_LISTS);
 
 	pvec = &per_cpu(lru_rotate_pvecs, cpu);
 	if (pagevec_count(pvec)) {
@@ -793,17 +791,16 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
 				 void *arg)
 {
-	enum lru_list lru = (enum lru_list)arg;
-	int file = is_file_lru(lru);
-	int active = is_active_lru(lru);
+	enum lru_list requested_lru = (enum lru_list)arg;
+	int file = page_is_file_cache(page);
+	int active = PageActive(page);
+	enum lru_list lru = page_lru(page);
 
-	VM_BUG_ON(PageActive(page));
+	WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
 	VM_BUG_ON(PageUnevictable(page));
 	VM_BUG_ON(PageLRU(page));
 
 	SetPageLRU(page);
-	if (active)
-		SetPageActive(page);
 	add_page_to_lru_list(page, lruvec, lru);
 	update_page_reclaim_stat(lruvec, file, active);
 	trace_mm_lru_insertion(page, page_to_pfn(page), lru, trace_pagemap_flags(page));
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
  2013-05-13 10:21 ` Mel Gorman
@ 2013-05-13 10:21   ` Mel Gorman
  -1 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
may fail to move a page to the active list as expected. Now that the LRU
is selected at LRU drain time, mark pages PageActive if they are on the
local pagevec so it gets moved to the correct list at LRU drain time.
Using a debugging patch it was found that for a simple git checkout based
workload that pages were never added to the active file list in practice
but with this patch applied they are.

				before   after
LRU Add Active File                  0      750583
LRU Add Active Anon            2640587     2702818
LRU Add Inactive File          8833662     8068353
LRU Add Inactive Anon              207         200

Note that only pages on the local pagevec are considered on purpose. A
!PageLRU page could be in the process of being released, reclaimed, migrated
or on a remote pagevec that is currently being drained. Marking it PageActive
is vunerable to races where PageLRU and Active bits are checked at the
wrong time. Page reclaim will trigger VM_BUG_ONs but depending on when the
race hits, it could also free a PageActive page to the page allocator and
trigger a bad_page warning. Similarly a potential race exists between a
per-cpu drain on a pagevec list and an activation on a remote CPU.

				lru_add_drain_cpu
				__pagevec_lru_add
				  lru = page_lru(page);
mark_page_accessed
  if (PageLRU(page))
    activate_page
  else
    SetPageActive
				  SetPageLRU(page);
				  add_page_to_lru_list(page, lruvec, lru);

In this case a PageActive page is added to the inactivate list and later the
inactive/active stats will get skewed. While the PageActive checks in vmscan
could be removed and potentially dealt with, a skew in the statistics would
be very difficult to detect. Hence this patch deals just with the common case
where a page being marked accessed has just been added to the local pagevec.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swap.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 0911579..7752407 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -431,6 +431,27 @@ void activate_page(struct page *page)
 }
 #endif
 
+static void __lru_cache_activate_page(struct page *page)
+{
+	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+	int i;
+
+	/*
+	 * Search backwards on the optimistic assumption that the page being
+	 * activated has just been added to this pagevec
+	 */
+	for (i = pagevec_count(pvec) - 1; i >= 0; i--) {
+		struct page *pagevec_page = pvec->pages[i];
+
+		if (pagevec_page == page) {
+			SetPageActive(page);
+			break;
+		}
+	}
+
+	put_cpu_var(lru_add_pvec);
+}
+
 /*
  * Mark a page as having seen activity.
  *
@@ -441,8 +462,17 @@ void activate_page(struct page *page)
 void mark_page_accessed(struct page *page)
 {
 	if (!PageActive(page) && !PageUnevictable(page) &&
-			PageReferenced(page) && PageLRU(page)) {
-		activate_page(page);
+			PageReferenced(page)) {
+
+		/*
+		 * If the page is on the LRU, promote immediately. Otherwise,
+		 * assume the page is on a pagevec, mark it active and it'll
+		 * be moved to the active LRU on the next drain
+		 */
+		if (PageLRU(page))
+			activate_page(page);
+		else
+			__lru_cache_activate_page(page);
 		ClearPageReferenced(page);
 	} else if (!PageReferenced(page)) {
 		SetPageReferenced(page);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
@ 2013-05-13 10:21   ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
may fail to move a page to the active list as expected. Now that the LRU
is selected at LRU drain time, mark pages PageActive if they are on the
local pagevec so it gets moved to the correct list at LRU drain time.
Using a debugging patch it was found that for a simple git checkout based
workload that pages were never added to the active file list in practice
but with this patch applied they are.

				before   after
LRU Add Active File                  0      750583
LRU Add Active Anon            2640587     2702818
LRU Add Inactive File          8833662     8068353
LRU Add Inactive Anon              207         200

Note that only pages on the local pagevec are considered on purpose. A
!PageLRU page could be in the process of being released, reclaimed, migrated
or on a remote pagevec that is currently being drained. Marking it PageActive
is vunerable to races where PageLRU and Active bits are checked at the
wrong time. Page reclaim will trigger VM_BUG_ONs but depending on when the
race hits, it could also free a PageActive page to the page allocator and
trigger a bad_page warning. Similarly a potential race exists between a
per-cpu drain on a pagevec list and an activation on a remote CPU.

				lru_add_drain_cpu
				__pagevec_lru_add
				  lru = page_lru(page);
mark_page_accessed
  if (PageLRU(page))
    activate_page
  else
    SetPageActive
				  SetPageLRU(page);
				  add_page_to_lru_list(page, lruvec, lru);

In this case a PageActive page is added to the inactivate list and later the
inactive/active stats will get skewed. While the PageActive checks in vmscan
could be removed and potentially dealt with, a skew in the statistics would
be very difficult to detect. Hence this patch deals just with the common case
where a page being marked accessed has just been added to the local pagevec.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swap.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 0911579..7752407 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -431,6 +431,27 @@ void activate_page(struct page *page)
 }
 #endif
 
+static void __lru_cache_activate_page(struct page *page)
+{
+	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+	int i;
+
+	/*
+	 * Search backwards on the optimistic assumption that the page being
+	 * activated has just been added to this pagevec
+	 */
+	for (i = pagevec_count(pvec) - 1; i >= 0; i--) {
+		struct page *pagevec_page = pvec->pages[i];
+
+		if (pagevec_page == page) {
+			SetPageActive(page);
+			break;
+		}
+	}
+
+	put_cpu_var(lru_add_pvec);
+}
+
 /*
  * Mark a page as having seen activity.
  *
@@ -441,8 +462,17 @@ void activate_page(struct page *page)
 void mark_page_accessed(struct page *page)
 {
 	if (!PageActive(page) && !PageUnevictable(page) &&
-			PageReferenced(page) && PageLRU(page)) {
-		activate_page(page);
+			PageReferenced(page)) {
+
+		/*
+		 * If the page is on the LRU, promote immediately. Otherwise,
+		 * assume the page is on a pagevec, mark it active and it'll
+		 * be moved to the active LRU on the next drain
+		 */
+		if (PageLRU(page))
+			activate_page(page);
+		else
+			__lru_cache_activate_page(page);
 		ClearPageReferenced(page);
 	} else if (!PageReferenced(page)) {
 		SetPageReferenced(page);
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/4] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API
  2013-05-13 10:21 ` Mel Gorman
@ 2013-05-13 10:21   ` Mel Gorman
  -1 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

Now that the LRU to add a page to is decided at LRU-add time, remove the
misleading lru parameter from __pagevec_lru_add. A consequence of this is
that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
are misleading as the caller no longer has direct control over what LRU
the page is added to. Unused helpers are removed by this patch and existing
users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
directly and use the per-cpu pagevecs instead of creating their own pagevec.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/cachefiles/rdwr.c    | 30 +++++++-----------------------
 fs/nfs/dir.c            |  7 ++-----
 include/linux/pagevec.h | 34 +---------------------------------
 mm/swap.c               | 12 ++++--------
 4 files changed, 14 insertions(+), 69 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 4809922..d24c13e 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -12,6 +12,7 @@
 #include <linux/mount.h>
 #include <linux/slab.h>
 #include <linux/file.h>
+#include <linux/swap.h>
 #include "internal.h"
 
 /*
@@ -227,8 +228,7 @@ static void cachefiles_read_copier(struct fscache_operation *_op)
  */
 static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 					    struct fscache_retrieval *op,
-					    struct page *netpage,
-					    struct pagevec *pagevec)
+					    struct page *netpage)
 {
 	struct cachefiles_one_read *monitor;
 	struct address_space *bmapping;
@@ -237,8 +237,6 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 
 	_enter("");
 
-	pagevec_reinit(pagevec);
-
 	_debug("read back %p{%lu,%d}",
 	       netpage, netpage->index, page_count(netpage));
 
@@ -283,9 +281,7 @@ installed_new_backing_page:
 	backpage = newpage;
 	newpage = NULL;
 
-	page_cache_get(backpage);
-	pagevec_add(pagevec, backpage);
-	__pagevec_lru_add_file(pagevec);
+	lru_cache_add_file(backpage);
 
 read_backing_page:
 	ret = bmapping->a_ops->readpage(NULL, backpage);
@@ -452,8 +448,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
 	if (block) {
 		/* submit the apparently valid page to the backing fs to be
 		 * read from disk */
-		ret = cachefiles_read_backing_file_one(object, op, page,
-						       &pagevec);
+		ret = cachefiles_read_backing_file_one(object, op, page);
 	} else if (cachefiles_has_space(cache, 0, 1) == 0) {
 		/* there's space in the cache we can use */
 		fscache_mark_page_cached(op, page);
@@ -482,14 +477,11 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 {
 	struct cachefiles_one_read *monitor = NULL;
 	struct address_space *bmapping = object->backer->d_inode->i_mapping;
-	struct pagevec lru_pvec;
 	struct page *newpage = NULL, *netpage, *_n, *backpage = NULL;
 	int ret = 0;
 
 	_enter("");
 
-	pagevec_init(&lru_pvec, 0);
-
 	list_for_each_entry_safe(netpage, _n, list, lru) {
 		list_del(&netpage->lru);
 
@@ -534,9 +526,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 		backpage = newpage;
 		newpage = NULL;
 
-		page_cache_get(backpage);
-		if (!pagevec_add(&lru_pvec, backpage))
-			__pagevec_lru_add_file(&lru_pvec);
+		lru_cache_add_file(backpage);
 
 	reread_backing_page:
 		ret = bmapping->a_ops->readpage(NULL, backpage);
@@ -559,9 +549,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 			goto nomem;
 		}
 
-		page_cache_get(netpage);
-		if (!pagevec_add(&lru_pvec, netpage))
-			__pagevec_lru_add_file(&lru_pvec);
+		lru_cache_add_file(netpage);
 
 		/* install a monitor */
 		page_cache_get(netpage);
@@ -643,9 +631,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 
 		fscache_mark_page_cached(op, netpage);
 
-		page_cache_get(netpage);
-		if (!pagevec_add(&lru_pvec, netpage))
-			__pagevec_lru_add_file(&lru_pvec);
+		lru_cache_add_file(netpage);
 
 		/* the netpage is unlocked and marked up to date here */
 		fscache_end_io(op, netpage, 0);
@@ -661,8 +647,6 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 
 out:
 	/* tidy up */
-	pagevec_lru_add_file(&lru_pvec);
-
 	if (newpage)
 		page_cache_release(newpage);
 	if (netpage)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index f23f455..787d89c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -33,6 +33,7 @@
 #include <linux/pagevec.h>
 #include <linux/namei.h>
 #include <linux/mount.h>
+#include <linux/swap.h>
 #include <linux/sched.h>
 #include <linux/kmemleak.h>
 #include <linux/xattr.h>
@@ -1757,7 +1758,6 @@ EXPORT_SYMBOL_GPL(nfs_unlink);
  */
 int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
 {
-	struct pagevec lru_pvec;
 	struct page *page;
 	char *kaddr;
 	struct iattr attr;
@@ -1797,11 +1797,8 @@ int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
 	 * No big deal if we can't add this page to the page cache here.
 	 * READLINK will get the missing page from the server if needed.
 	 */
-	pagevec_init(&lru_pvec, 0);
-	if (!add_to_page_cache(page, dentry->d_inode->i_mapping, 0,
+	if (!add_to_page_cache_lru(page, dentry->d_inode->i_mapping, 0,
 							GFP_KERNEL)) {
-		pagevec_add(&lru_pvec, page);
-		pagevec_lru_add_file(&lru_pvec);
 		SetPageUptodate(page);
 		unlock_page(page);
 	} else
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 2aa12b8..e4dbfab3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -21,7 +21,7 @@ struct pagevec {
 };
 
 void __pagevec_release(struct pagevec *pvec);
-void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru);
+void __pagevec_lru_add(struct pagevec *pvec);
 unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
 		pgoff_t start, unsigned nr_pages);
 unsigned pagevec_lookup_tag(struct pagevec *pvec,
@@ -64,36 +64,4 @@ static inline void pagevec_release(struct pagevec *pvec)
 		__pagevec_release(pvec);
 }
 
-static inline void __pagevec_lru_add_anon(struct pagevec *pvec)
-{
-	__pagevec_lru_add(pvec, LRU_INACTIVE_ANON);
-}
-
-static inline void __pagevec_lru_add_active_anon(struct pagevec *pvec)
-{
-	__pagevec_lru_add(pvec, LRU_ACTIVE_ANON);
-}
-
-static inline void __pagevec_lru_add_file(struct pagevec *pvec)
-{
-	__pagevec_lru_add(pvec, LRU_INACTIVE_FILE);
-}
-
-static inline void __pagevec_lru_add_active_file(struct pagevec *pvec)
-{
-	__pagevec_lru_add(pvec, LRU_ACTIVE_FILE);
-}
-
-static inline void pagevec_lru_add_file(struct pagevec *pvec)
-{
-	if (pagevec_count(pvec))
-		__pagevec_lru_add_file(pvec);
-}
-
-static inline void pagevec_lru_add_anon(struct pagevec *pvec)
-{
-	if (pagevec_count(pvec))
-		__pagevec_lru_add_anon(pvec);
-}
-
 #endif /* _LINUX_PAGEVEC_H */
diff --git a/mm/swap.c b/mm/swap.c
index 7752407..bd4f524 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -499,7 +499,7 @@ void __lru_cache_add(struct page *page, enum lru_list lru)
 
 	page_cache_get(page);
 	if (!pagevec_space(pvec))
-		__pagevec_lru_add(pvec, lru);
+		__pagevec_lru_add(pvec);
 	pagevec_add(pvec, page);
 	put_cpu_var(lru_add_pvec);
 }
@@ -622,7 +622,7 @@ void lru_add_drain_cpu(int cpu)
 	struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);
 
 	if (pagevec_count(pvec))
-		__pagevec_lru_add(pvec, NR_LRU_LISTS);
+		__pagevec_lru_add(pvec);
 
 	pvec = &per_cpu(lru_rotate_pvecs, cpu);
 	if (pagevec_count(pvec)) {
@@ -821,12 +821,10 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
 				 void *arg)
 {
-	enum lru_list requested_lru = (enum lru_list)arg;
 	int file = page_is_file_cache(page);
 	int active = PageActive(page);
 	enum lru_list lru = page_lru(page);
 
-	WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
 	VM_BUG_ON(PageUnevictable(page));
 	VM_BUG_ON(PageLRU(page));
 
@@ -840,11 +838,9 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
  * Add the passed pages to the LRU, then drop the caller's refcount
  * on them.  Reinitialises the caller's pagevec.
  */
-void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru)
+void __pagevec_lru_add(struct pagevec *pvec)
 {
-	VM_BUG_ON(is_unevictable_lru(lru));
-
-	pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, (void *)lru);
+	pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL);
 }
 EXPORT_SYMBOL(__pagevec_lru_add);
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/4] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API
@ 2013-05-13 10:21   ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-13 10:21 UTC (permalink / raw)
  To: Alexey Lyahkov, Andrew Perepechko, Robin Dong
  Cc: Theodore Tso, Andrew Morton, Hugh Dickins, Rik van Riel,
	Johannes Weiner, Bernd Schubert, David Howells, Trond Myklebust,
	Linux-fsdevel, Linux-ext4, LKML, Linux-mm, Mel Gorman

Now that the LRU to add a page to is decided at LRU-add time, remove the
misleading lru parameter from __pagevec_lru_add. A consequence of this is
that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
are misleading as the caller no longer has direct control over what LRU
the page is added to. Unused helpers are removed by this patch and existing
users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
directly and use the per-cpu pagevecs instead of creating their own pagevec.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/cachefiles/rdwr.c    | 30 +++++++-----------------------
 fs/nfs/dir.c            |  7 ++-----
 include/linux/pagevec.h | 34 +---------------------------------
 mm/swap.c               | 12 ++++--------
 4 files changed, 14 insertions(+), 69 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 4809922..d24c13e 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -12,6 +12,7 @@
 #include <linux/mount.h>
 #include <linux/slab.h>
 #include <linux/file.h>
+#include <linux/swap.h>
 #include "internal.h"
 
 /*
@@ -227,8 +228,7 @@ static void cachefiles_read_copier(struct fscache_operation *_op)
  */
 static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 					    struct fscache_retrieval *op,
-					    struct page *netpage,
-					    struct pagevec *pagevec)
+					    struct page *netpage)
 {
 	struct cachefiles_one_read *monitor;
 	struct address_space *bmapping;
@@ -237,8 +237,6 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 
 	_enter("");
 
-	pagevec_reinit(pagevec);
-
 	_debug("read back %p{%lu,%d}",
 	       netpage, netpage->index, page_count(netpage));
 
@@ -283,9 +281,7 @@ installed_new_backing_page:
 	backpage = newpage;
 	newpage = NULL;
 
-	page_cache_get(backpage);
-	pagevec_add(pagevec, backpage);
-	__pagevec_lru_add_file(pagevec);
+	lru_cache_add_file(backpage);
 
 read_backing_page:
 	ret = bmapping->a_ops->readpage(NULL, backpage);
@@ -452,8 +448,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
 	if (block) {
 		/* submit the apparently valid page to the backing fs to be
 		 * read from disk */
-		ret = cachefiles_read_backing_file_one(object, op, page,
-						       &pagevec);
+		ret = cachefiles_read_backing_file_one(object, op, page);
 	} else if (cachefiles_has_space(cache, 0, 1) == 0) {
 		/* there's space in the cache we can use */
 		fscache_mark_page_cached(op, page);
@@ -482,14 +477,11 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 {
 	struct cachefiles_one_read *monitor = NULL;
 	struct address_space *bmapping = object->backer->d_inode->i_mapping;
-	struct pagevec lru_pvec;
 	struct page *newpage = NULL, *netpage, *_n, *backpage = NULL;
 	int ret = 0;
 
 	_enter("");
 
-	pagevec_init(&lru_pvec, 0);
-
 	list_for_each_entry_safe(netpage, _n, list, lru) {
 		list_del(&netpage->lru);
 
@@ -534,9 +526,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 		backpage = newpage;
 		newpage = NULL;
 
-		page_cache_get(backpage);
-		if (!pagevec_add(&lru_pvec, backpage))
-			__pagevec_lru_add_file(&lru_pvec);
+		lru_cache_add_file(backpage);
 
 	reread_backing_page:
 		ret = bmapping->a_ops->readpage(NULL, backpage);
@@ -559,9 +549,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 			goto nomem;
 		}
 
-		page_cache_get(netpage);
-		if (!pagevec_add(&lru_pvec, netpage))
-			__pagevec_lru_add_file(&lru_pvec);
+		lru_cache_add_file(netpage);
 
 		/* install a monitor */
 		page_cache_get(netpage);
@@ -643,9 +631,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 
 		fscache_mark_page_cached(op, netpage);
 
-		page_cache_get(netpage);
-		if (!pagevec_add(&lru_pvec, netpage))
-			__pagevec_lru_add_file(&lru_pvec);
+		lru_cache_add_file(netpage);
 
 		/* the netpage is unlocked and marked up to date here */
 		fscache_end_io(op, netpage, 0);
@@ -661,8 +647,6 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 
 out:
 	/* tidy up */
-	pagevec_lru_add_file(&lru_pvec);
-
 	if (newpage)
 		page_cache_release(newpage);
 	if (netpage)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index f23f455..787d89c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -33,6 +33,7 @@
 #include <linux/pagevec.h>
 #include <linux/namei.h>
 #include <linux/mount.h>
+#include <linux/swap.h>
 #include <linux/sched.h>
 #include <linux/kmemleak.h>
 #include <linux/xattr.h>
@@ -1757,7 +1758,6 @@ EXPORT_SYMBOL_GPL(nfs_unlink);
  */
 int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
 {
-	struct pagevec lru_pvec;
 	struct page *page;
 	char *kaddr;
 	struct iattr attr;
@@ -1797,11 +1797,8 @@ int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
 	 * No big deal if we can't add this page to the page cache here.
 	 * READLINK will get the missing page from the server if needed.
 	 */
-	pagevec_init(&lru_pvec, 0);
-	if (!add_to_page_cache(page, dentry->d_inode->i_mapping, 0,
+	if (!add_to_page_cache_lru(page, dentry->d_inode->i_mapping, 0,
 							GFP_KERNEL)) {
-		pagevec_add(&lru_pvec, page);
-		pagevec_lru_add_file(&lru_pvec);
 		SetPageUptodate(page);
 		unlock_page(page);
 	} else
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 2aa12b8..e4dbfab3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -21,7 +21,7 @@ struct pagevec {
 };
 
 void __pagevec_release(struct pagevec *pvec);
-void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru);
+void __pagevec_lru_add(struct pagevec *pvec);
 unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
 		pgoff_t start, unsigned nr_pages);
 unsigned pagevec_lookup_tag(struct pagevec *pvec,
@@ -64,36 +64,4 @@ static inline void pagevec_release(struct pagevec *pvec)
 		__pagevec_release(pvec);
 }
 
-static inline void __pagevec_lru_add_anon(struct pagevec *pvec)
-{
-	__pagevec_lru_add(pvec, LRU_INACTIVE_ANON);
-}
-
-static inline void __pagevec_lru_add_active_anon(struct pagevec *pvec)
-{
-	__pagevec_lru_add(pvec, LRU_ACTIVE_ANON);
-}
-
-static inline void __pagevec_lru_add_file(struct pagevec *pvec)
-{
-	__pagevec_lru_add(pvec, LRU_INACTIVE_FILE);
-}
-
-static inline void __pagevec_lru_add_active_file(struct pagevec *pvec)
-{
-	__pagevec_lru_add(pvec, LRU_ACTIVE_FILE);
-}
-
-static inline void pagevec_lru_add_file(struct pagevec *pvec)
-{
-	if (pagevec_count(pvec))
-		__pagevec_lru_add_file(pvec);
-}
-
-static inline void pagevec_lru_add_anon(struct pagevec *pvec)
-{
-	if (pagevec_count(pvec))
-		__pagevec_lru_add_anon(pvec);
-}
-
 #endif /* _LINUX_PAGEVEC_H */
diff --git a/mm/swap.c b/mm/swap.c
index 7752407..bd4f524 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -499,7 +499,7 @@ void __lru_cache_add(struct page *page, enum lru_list lru)
 
 	page_cache_get(page);
 	if (!pagevec_space(pvec))
-		__pagevec_lru_add(pvec, lru);
+		__pagevec_lru_add(pvec);
 	pagevec_add(pvec, page);
 	put_cpu_var(lru_add_pvec);
 }
@@ -622,7 +622,7 @@ void lru_add_drain_cpu(int cpu)
 	struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);
 
 	if (pagevec_count(pvec))
-		__pagevec_lru_add(pvec, NR_LRU_LISTS);
+		__pagevec_lru_add(pvec);
 
 	pvec = &per_cpu(lru_rotate_pvecs, cpu);
 	if (pagevec_count(pvec)) {
@@ -821,12 +821,10 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
 				 void *arg)
 {
-	enum lru_list requested_lru = (enum lru_list)arg;
 	int file = page_is_file_cache(page);
 	int active = PageActive(page);
 	enum lru_list lru = page_lru(page);
 
-	WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
 	VM_BUG_ON(PageUnevictable(page));
 	VM_BUG_ON(PageLRU(page));
 
@@ -840,11 +838,9 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
  * Add the passed pages to the LRU, then drop the caller's refcount
  * on them.  Reinitialises the caller's pagevec.
  */
-void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru)
+void __pagevec_lru_add(struct pagevec *pvec)
 {
-	VM_BUG_ON(is_unevictable_lru(lru));
-
-	pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, (void *)lru);
+	pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL);
 }
 EXPORT_SYMBOL(__pagevec_lru_add);
 
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] mm: Add tracepoints for LRU activation and insertions
  2013-05-13 10:21   ` Mel Gorman
@ 2013-05-15 17:39     ` Rik van Riel
  -1 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2013-05-15 17:39 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Andrew Morton, Hugh Dickins, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On 05/13/2013 06:21 AM, Mel Gorman wrote:
> Using these tracepoints it is possible to model LRU activity and the
> average residency of pages of different types. This can be used to
> debug problems related to premature reclaim of pages of particular
> types.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] mm: Add tracepoints for LRU activation and insertions
@ 2013-05-15 17:39     ` Rik van Riel
  0 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2013-05-15 17:39 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Andrew Morton, Hugh Dickins, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On 05/13/2013 06:21 AM, Mel Gorman wrote:
> Using these tracepoints it is possible to model LRU activity and the
> average residency of pages of different types. This can be used to
> debug problems related to premature reclaim of pages of particular
> types.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
  2013-05-13 10:21   ` Mel Gorman
@ 2013-05-15 17:40     ` Rik van Riel
  -1 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2013-05-15 17:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Andrew Morton, Hugh Dickins, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On 05/13/2013 06:21 AM, Mel Gorman wrote:

> In this case a PageActive page is added to the inactivate list and later the
> inactive/active stats will get skewed. While the PageActive checks in vmscan
> could be removed and potentially dealt with, a skew in the statistics would
> be very difficult to detect. Hence this patch deals just with the common case
> where a page being marked accessed has just been added to the local pagevec.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>

After thinking about it some more, I suspect the possible issue
I outlined before should not be an issue in practice.

Acked-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
@ 2013-05-15 17:40     ` Rik van Riel
  0 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2013-05-15 17:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Andrew Morton, Hugh Dickins, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On 05/13/2013 06:21 AM, Mel Gorman wrote:

> In this case a PageActive page is added to the inactivate list and later the
> inactive/active stats will get skewed. While the PageActive checks in vmscan
> could be removed and potentially dealt with, a skew in the statistics would
> be very difficult to detect. Hence this patch deals just with the common case
> where a page being marked accessed has just been added to the local pagevec.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>

After thinking about it some more, I suspect the possible issue
I outlined before should not be an issue in practice.

Acked-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API
  2013-05-13 10:21   ` Mel Gorman
@ 2013-05-15 17:46     ` Rik van Riel
  -1 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2013-05-15 17:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Andrew Morton, Hugh Dickins, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On 05/13/2013 06:21 AM, Mel Gorman wrote:
> Now that the LRU to add a page to is decided at LRU-add time, remove the
> misleading lru parameter from __pagevec_lru_add. A consequence of this is
> that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
> are misleading as the caller no longer has direct control over what LRU
> the page is added to. Unused helpers are removed by this patch and existing
> users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
> directly and use the per-cpu pagevecs instead of creating their own pagevec.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Jan Kara <jack@suse.cz>

Reviewed-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API
@ 2013-05-15 17:46     ` Rik van Riel
  0 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2013-05-15 17:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Andrew Morton, Hugh Dickins, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On 05/13/2013 06:21 AM, Mel Gorman wrote:
> Now that the LRU to add a page to is decided at LRU-add time, remove the
> misleading lru parameter from __pagevec_lru_add. A consequence of this is
> that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
> are misleading as the caller no longer has direct control over what LRU
> the page is added to. Unused helpers are removed by this patch and existing
> users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
> directly and use the per-cpu pagevecs instead of creating their own pagevec.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Jan Kara <jack@suse.cz>

Reviewed-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time
  2013-05-13 10:21   ` Mel Gorman
@ 2013-05-15 22:53     ` Andrew Morton
  -1 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2013-05-15 22:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Mon, 13 May 2013 11:21:20 +0100 Mel Gorman <mgorman@suse.de> wrote:

> mark_page_accessed cannot activate an inactive page that is located on
> an inactive LRU pagevec. Hints from filesystems may be ignored as a
> result. In preparation for fixing that problem, this patch removes the
> per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
> added to is deferred until the pagevec is drained.
> 
> This means that fewer pagevecs are available and potentially there is
> greater contention on the LRU lock. However, this only applies in the case
> where there is an almost perfect mix of file, anon, active and inactive
> pages being added to the LRU. In practice I expect that we are adding
> stream of pages of a particular time and that the changes in contention
> will barely be measurable.
> 
> ...
>
> index c612a6a..0911579 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -39,7 +39,7 @@
>  /* How many pages do we try to swap or page in/out together? */
>  int page_cluster;
>  
> -static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
> +static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
>  static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
>  static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
>  
> @@ -460,13 +460,18 @@ EXPORT_SYMBOL(mark_page_accessed);
>   */

The comment preceding __lru_cache_add() needs an update.

>  void __lru_cache_add(struct page *page, enum lru_list lru)
>  {
> -	struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
> +	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
> +
> +	if (is_active_lru(lru))
> +		SetPageActive(page);
> +	else
> +		ClearPageActive(page);

The whole use of `enum lru_list' here has always made my cry, but I'll weep
about that in [4/4].

>  	page_cache_get(page);
>  	if (!pagevec_space(pvec))
>  		__pagevec_lru_add(pvec, lru);
>  	pagevec_add(pvec, page);
> -	put_cpu_var(lru_add_pvecs);
> +	put_cpu_var(lru_add_pvec);
>  }
>
> ...
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time
@ 2013-05-15 22:53     ` Andrew Morton
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2013-05-15 22:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Mon, 13 May 2013 11:21:20 +0100 Mel Gorman <mgorman@suse.de> wrote:

> mark_page_accessed cannot activate an inactive page that is located on
> an inactive LRU pagevec. Hints from filesystems may be ignored as a
> result. In preparation for fixing that problem, this patch removes the
> per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
> added to is deferred until the pagevec is drained.
> 
> This means that fewer pagevecs are available and potentially there is
> greater contention on the LRU lock. However, this only applies in the case
> where there is an almost perfect mix of file, anon, active and inactive
> pages being added to the LRU. In practice I expect that we are adding
> stream of pages of a particular time and that the changes in contention
> will barely be measurable.
> 
> ...
>
> index c612a6a..0911579 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -39,7 +39,7 @@
>  /* How many pages do we try to swap or page in/out together? */
>  int page_cluster;
>  
> -static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
> +static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
>  static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
>  static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
>  
> @@ -460,13 +460,18 @@ EXPORT_SYMBOL(mark_page_accessed);
>   */

The comment preceding __lru_cache_add() needs an update.

>  void __lru_cache_add(struct page *page, enum lru_list lru)
>  {
> -	struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
> +	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
> +
> +	if (is_active_lru(lru))
> +		SetPageActive(page);
> +	else
> +		ClearPageActive(page);

The whole use of `enum lru_list' here has always made my cry, but I'll weep
about that in [4/4].

>  	page_cache_get(page);
>  	if (!pagevec_space(pvec))
>  		__pagevec_lru_add(pvec, lru);
>  	pagevec_add(pvec, page);
> -	put_cpu_var(lru_add_pvecs);
> +	put_cpu_var(lru_add_pvec);
>  }
>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
  2013-05-13 10:21   ` Mel Gorman
@ 2013-05-15 22:55     ` Andrew Morton
  -1 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2013-05-15 22:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Mon, 13 May 2013 11:21:21 +0100 Mel Gorman <mgorman@suse.de> wrote:

> If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
> may fail to move a page to the active list as expected. Now that the LRU
> is selected at LRU drain time, mark pages PageActive if they are on the
> local pagevec so it gets moved to the correct list at LRU drain time.
> Using a debugging patch it was found that for a simple git checkout based
> workload that pages were never added to the active file list in practice
> but with this patch applied they are.
> 
> 				before   after
> LRU Add Active File                  0      750583
> LRU Add Active Anon            2640587     2702818
> LRU Add Inactive File          8833662     8068353
> LRU Add Inactive Anon              207         200
> 
> Note that only pages on the local pagevec are considered on purpose. A
> !PageLRU page could be in the process of being released, reclaimed, migrated
> or on a remote pagevec that is currently being drained. Marking it PageActive
> is vunerable to races where PageLRU and Active bits are checked at the
> wrong time. Page reclaim will trigger VM_BUG_ONs but depending on when the
> race hits, it could also free a PageActive page to the page allocator and
> trigger a bad_page warning. Similarly a potential race exists between a
> per-cpu drain on a pagevec list and an activation on a remote CPU.
> 
> 				lru_add_drain_cpu
> 				__pagevec_lru_add
> 				  lru = page_lru(page);
> mark_page_accessed
>   if (PageLRU(page))
>     activate_page
>   else
>     SetPageActive
> 				  SetPageLRU(page);
> 				  add_page_to_lru_list(page, lruvec, lru);
> 
> In this case a PageActive page is added to the inactivate list and later the
> inactive/active stats will get skewed. While the PageActive checks in vmscan
> could be removed and potentially dealt with, a skew in the statistics would
> be very difficult to detect. Hence this patch deals just with the common case
> where a page being marked accessed has just been added to the local pagevec.

but but but

> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -431,6 +431,27 @@ void activate_page(struct page *page)
>  }
>  #endif
>  
> +static void __lru_cache_activate_page(struct page *page)
> +{
> +	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
> +	int i;
> +
> +	/*
> +	 * Search backwards on the optimistic assumption that the page being
> +	 * activated has just been added to this pagevec
> +	 */
> +	for (i = pagevec_count(pvec) - 1; i >= 0; i--) {
> +		struct page *pagevec_page = pvec->pages[i];
> +
> +		if (pagevec_page == page) {
> +			SetPageActive(page);
> +			break;
> +		}
> +	}
> +
> +	put_cpu_var(lru_add_pvec);
> +}
> +
>  /*
>   * Mark a page as having seen activity.
>   *
> @@ -441,8 +462,17 @@ void activate_page(struct page *page)
>  void mark_page_accessed(struct page *page)
>  {
>  	if (!PageActive(page) && !PageUnevictable(page) &&
> -			PageReferenced(page) && PageLRU(page)) {
> -		activate_page(page);
> +			PageReferenced(page)) {
> +
> +		/*
> +		 * If the page is on the LRU, promote immediately. Otherwise,
> +		 * assume the page is on a pagevec, mark it active and it'll
> +		 * be moved to the active LRU on the next drain
> +		 */
> +		if (PageLRU(page))
> +			activate_page(page);
> +		else
> +			__lru_cache_activate_page(page);
>  		ClearPageReferenced(page);
>  	} else if (!PageReferenced(page)) {
>  		SetPageReferenced(page);

For starters, activate_page() doesn't "promote immediately".  It sticks
the page into yet another pagevec for deferred activation.

Also, I really worry about the fact that
activate_page()->drain->__activate_page() will simply skip over the
page if it has PageActive set!  So PageActive does something useful if
the page is in the add-to-lru pagevec but nothing useful if the page is
in the activate-it-soon pagevec.  This is a confusing, unobvious bug
attractant.

Secondly, I really don't see how this code avoids the races.  Suppose
the page gets spilled from the to-add-to-lru pagevec and onto the real
LRU while mark_page_accessed() is concurrently executing.  We end up
setting PageActive on a page which is on the inactive LRU?  Maybe this
is a can't-happen, in which case it's nowhere near clear enough *why*
this can't happen.





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
@ 2013-05-15 22:55     ` Andrew Morton
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2013-05-15 22:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Mon, 13 May 2013 11:21:21 +0100 Mel Gorman <mgorman@suse.de> wrote:

> If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
> may fail to move a page to the active list as expected. Now that the LRU
> is selected at LRU drain time, mark pages PageActive if they are on the
> local pagevec so it gets moved to the correct list at LRU drain time.
> Using a debugging patch it was found that for a simple git checkout based
> workload that pages were never added to the active file list in practice
> but with this patch applied they are.
> 
> 				before   after
> LRU Add Active File                  0      750583
> LRU Add Active Anon            2640587     2702818
> LRU Add Inactive File          8833662     8068353
> LRU Add Inactive Anon              207         200
> 
> Note that only pages on the local pagevec are considered on purpose. A
> !PageLRU page could be in the process of being released, reclaimed, migrated
> or on a remote pagevec that is currently being drained. Marking it PageActive
> is vunerable to races where PageLRU and Active bits are checked at the
> wrong time. Page reclaim will trigger VM_BUG_ONs but depending on when the
> race hits, it could also free a PageActive page to the page allocator and
> trigger a bad_page warning. Similarly a potential race exists between a
> per-cpu drain on a pagevec list and an activation on a remote CPU.
> 
> 				lru_add_drain_cpu
> 				__pagevec_lru_add
> 				  lru = page_lru(page);
> mark_page_accessed
>   if (PageLRU(page))
>     activate_page
>   else
>     SetPageActive
> 				  SetPageLRU(page);
> 				  add_page_to_lru_list(page, lruvec, lru);
> 
> In this case a PageActive page is added to the inactivate list and later the
> inactive/active stats will get skewed. While the PageActive checks in vmscan
> could be removed and potentially dealt with, a skew in the statistics would
> be very difficult to detect. Hence this patch deals just with the common case
> where a page being marked accessed has just been added to the local pagevec.

but but but

> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -431,6 +431,27 @@ void activate_page(struct page *page)
>  }
>  #endif
>  
> +static void __lru_cache_activate_page(struct page *page)
> +{
> +	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
> +	int i;
> +
> +	/*
> +	 * Search backwards on the optimistic assumption that the page being
> +	 * activated has just been added to this pagevec
> +	 */
> +	for (i = pagevec_count(pvec) - 1; i >= 0; i--) {
> +		struct page *pagevec_page = pvec->pages[i];
> +
> +		if (pagevec_page == page) {
> +			SetPageActive(page);
> +			break;
> +		}
> +	}
> +
> +	put_cpu_var(lru_add_pvec);
> +}
> +
>  /*
>   * Mark a page as having seen activity.
>   *
> @@ -441,8 +462,17 @@ void activate_page(struct page *page)
>  void mark_page_accessed(struct page *page)
>  {
>  	if (!PageActive(page) && !PageUnevictable(page) &&
> -			PageReferenced(page) && PageLRU(page)) {
> -		activate_page(page);
> +			PageReferenced(page)) {
> +
> +		/*
> +		 * If the page is on the LRU, promote immediately. Otherwise,
> +		 * assume the page is on a pagevec, mark it active and it'll
> +		 * be moved to the active LRU on the next drain
> +		 */
> +		if (PageLRU(page))
> +			activate_page(page);
> +		else
> +			__lru_cache_activate_page(page);
>  		ClearPageReferenced(page);
>  	} else if (!PageReferenced(page)) {
>  		SetPageReferenced(page);

For starters, activate_page() doesn't "promote immediately".  It sticks
the page into yet another pagevec for deferred activation.

Also, I really worry about the fact that
activate_page()->drain->__activate_page() will simply skip over the
page if it has PageActive set!  So PageActive does something useful if
the page is in the add-to-lru pagevec but nothing useful if the page is
in the activate-it-soon pagevec.  This is a confusing, unobvious bug
attractant.

Secondly, I really don't see how this code avoids the races.  Suppose
the page gets spilled from the to-add-to-lru pagevec and onto the real
LRU while mark_page_accessed() is concurrently executing.  We end up
setting PageActive on a page which is on the inactive LRU?  Maybe this
is a can't-happen, in which case it's nowhere near clear enough *why*
this can't happen.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API
  2013-05-13 10:21   ` Mel Gorman
@ 2013-05-15 22:56     ` Andrew Morton
  -1 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2013-05-15 22:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Mon, 13 May 2013 11:21:22 +0100 Mel Gorman <mgorman@suse.de> wrote:

> Now that the LRU to add a page to is decided at LRU-add time, remove the
> misleading lru parameter from __pagevec_lru_add. A consequence of this is
> that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
> are misleading as the caller no longer has direct control over what LRU
> the page is added to. Unused helpers are removed by this patch and existing
> users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
> directly and use the per-cpu pagevecs instead of creating their own pagevec.

Well maybe.  The `lru' arg to __lru_cache_add is still there and is
rather misleading (I find it maddening ;)).  AIUI, it's just there as
the means by which the __lru_cache_add() caller tells the LRU manager
that the caller wishes this page to start life on the active LRU, yes? 
It doesn't _really_ specify an LRU list at all.

In which case I think it would be a heck of a lot clearer if the
callers were to do

	SetPageActve(page);
	__lru_cache_add(page);

no?  (Or __lru_cache_add_active(page) and
__lru_cache_add_inactive(page) if one prefers).

Ditto lru_cache_add_lru() and probably other things.  Let's have one
way of communicating activeness, not two.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API
@ 2013-05-15 22:56     ` Andrew Morton
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2013-05-15 22:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Mon, 13 May 2013 11:21:22 +0100 Mel Gorman <mgorman@suse.de> wrote:

> Now that the LRU to add a page to is decided at LRU-add time, remove the
> misleading lru parameter from __pagevec_lru_add. A consequence of this is
> that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
> are misleading as the caller no longer has direct control over what LRU
> the page is added to. Unused helpers are removed by this patch and existing
> users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
> directly and use the per-cpu pagevecs instead of creating their own pagevec.

Well maybe.  The `lru' arg to __lru_cache_add is still there and is
rather misleading (I find it maddening ;)).  AIUI, it's just there as
the means by which the __lru_cache_add() caller tells the LRU manager
that the caller wishes this page to start life on the active LRU, yes? 
It doesn't _really_ specify an LRU list at all.

In which case I think it would be a heck of a lot clearer if the
callers were to do

	SetPageActve(page);
	__lru_cache_add(page);

no?  (Or __lru_cache_add_active(page) and
__lru_cache_add_inactive(page) if one prefers).

Ditto lru_cache_add_lru() and probably other things.  Let's have one
way of communicating activeness, not two.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
  2013-05-15 22:55     ` Andrew Morton
@ 2013-05-16 13:41       ` Mel Gorman
  -1 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-16 13:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Wed, May 15, 2013 at 03:55:00PM -0700, Andrew Morton wrote:
> > @@ -441,8 +462,17 @@ void activate_page(struct page *page)
> >  void mark_page_accessed(struct page *page)
> >  {
> >  	if (!PageActive(page) && !PageUnevictable(page) &&
> > -			PageReferenced(page) && PageLRU(page)) {
> > -		activate_page(page);
> > +			PageReferenced(page)) {
> > +
> > +		/*
> > +		 * If the page is on the LRU, promote immediately. Otherwise,
> > +		 * assume the page is on a pagevec, mark it active and it'll
> > +		 * be moved to the active LRU on the next drain
> > +		 */
> > +		if (PageLRU(page))
> > +			activate_page(page);
> > +		else
> > +			__lru_cache_activate_page(page);
> >  		ClearPageReferenced(page);
> >  	} else if (!PageReferenced(page)) {
> >  		SetPageReferenced(page);
> 
> For starters, activate_page() doesn't "promote immediately".  It sticks
> the page into yet another pagevec for deferred activation.
> 

True, comment updated.

> Also, I really worry about the fact that
> activate_page()->drain->__activate_page() will simply skip over the
> page if it has PageActive set!  So PageActive does something useful if
> the page is in the add-to-lru pagevec but nothing useful if the page is
> in the activate-it-soon pagevec.  This is a confusing, unobvious bug
> attractant.
> 

>From mark_page_accessed, we only call activate_page() for !PageActive
and PageLRU. The PageLRU is key, if it's set, the pages *must* be on the
inactive list or they'd trigger BUG_ON(PageActive) checks within
vmscan.c. Am I missing your point?

If I remove the PageActive check in mark_page_accessed then pages that
are already on the active list will always get moved to the top of the
active list. If that page is frequently passed to mark_page_accessed(),
it will both potentially increase the lifetime of the page and the
amount of LRU churn. This would be very unexpected.

> Secondly, I really don't see how this code avoids the races.  Suppose
> the page gets spilled from the to-add-to-lru pagevec and onto the real
> LRU while mark_page_accessed() is concurrently executing. 

Good question. The key here is that __lru_cache_activate_page only
searches the pagevec for the local CPU. If the current CPU is draining the
to_add_to_lru pagevec, it cannot also be simultaneously setting PageActive
in mark_page_accessed. It was discussed in the changelog here.

"Note that only pages on the local pagevec are considered on purpose. A
!PageLRU page could be in the process of being released, reclaimed,
migrated or on a remote pagevec that is currently being drained. Marking
it PageActive is vunerable to races where PageLRU and Active bits are
checked at the wrong time."

Subtle comments on the code belong in the changelog, right?

> We end up
> setting PageActive on a page which is on the inactive LRU?  Maybe this
> is a can't-happen, in which case it's nowhere near clear enough *why*
> this can't happen.
> 

I don't think it can happen. If I'm wrong, PageActive checks in vmscan.c
will trigger.

How about putting this fix on top?

---8<---
mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec -fix

Give the comments a beefier arm.

Signed-off-by: Mel Gorman <mgorman@suse.de>

diff --git a/mm/swap.c b/mm/swap.c
index 5646e31..49eb93f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -439,7 +439,13 @@ static void __lru_cache_activate_page(struct page *page)
 
 	/*
 	 * Search backwards on the optimistic assumption that the page being
-	 * activated has just been added to this pagevec
+	 * activated has just been added to this pagevec. Note that only
+	 * the local pagevec is examined as a !PageLRU page could be in the
+	 * process of being released, reclaimed, migrated or on a remote
+	 * pagevec that is currently being drained. Furthermore, marking
+	 * a remote pagevec's page PageActive potentially hits a race where
+	 * a page is marked PageActive just after it is added to the inactive
+	 * list causing accounting errors and BUG_ON checks to trigger.
 	 */
 	for (i = pagevec_count(pvec) - 1; i >= 0; i--) {
 		struct page *pagevec_page = pvec->pages[i];
@@ -466,9 +472,10 @@ void mark_page_accessed(struct page *page)
 			PageReferenced(page)) {
 
 		/*
-		 * If the page is on the LRU, promote immediately. Otherwise,
-		 * assume the page is on a pagevec, mark it active and it'll
-		 * be moved to the active LRU on the next drain
+		 * If the page is on the LRU, queue it for activation via
+		 * activate_page_pvecs. Otherwise, assume the page is on a
+		 * pagevec, mark it active and it'll be moved to the active
+		 * LRU on the next drain.
 		 */
 		if (PageLRU(page))
 			activate_page(page);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
@ 2013-05-16 13:41       ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-16 13:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Wed, May 15, 2013 at 03:55:00PM -0700, Andrew Morton wrote:
> > @@ -441,8 +462,17 @@ void activate_page(struct page *page)
> >  void mark_page_accessed(struct page *page)
> >  {
> >  	if (!PageActive(page) && !PageUnevictable(page) &&
> > -			PageReferenced(page) && PageLRU(page)) {
> > -		activate_page(page);
> > +			PageReferenced(page)) {
> > +
> > +		/*
> > +		 * If the page is on the LRU, promote immediately. Otherwise,
> > +		 * assume the page is on a pagevec, mark it active and it'll
> > +		 * be moved to the active LRU on the next drain
> > +		 */
> > +		if (PageLRU(page))
> > +			activate_page(page);
> > +		else
> > +			__lru_cache_activate_page(page);
> >  		ClearPageReferenced(page);
> >  	} else if (!PageReferenced(page)) {
> >  		SetPageReferenced(page);
> 
> For starters, activate_page() doesn't "promote immediately".  It sticks
> the page into yet another pagevec for deferred activation.
> 

True, comment updated.

> Also, I really worry about the fact that
> activate_page()->drain->__activate_page() will simply skip over the
> page if it has PageActive set!  So PageActive does something useful if
> the page is in the add-to-lru pagevec but nothing useful if the page is
> in the activate-it-soon pagevec.  This is a confusing, unobvious bug
> attractant.
> 

>From mark_page_accessed, we only call activate_page() for !PageActive
and PageLRU. The PageLRU is key, if it's set, the pages *must* be on the
inactive list or they'd trigger BUG_ON(PageActive) checks within
vmscan.c. Am I missing your point?

If I remove the PageActive check in mark_page_accessed then pages that
are already on the active list will always get moved to the top of the
active list. If that page is frequently passed to mark_page_accessed(),
it will both potentially increase the lifetime of the page and the
amount of LRU churn. This would be very unexpected.

> Secondly, I really don't see how this code avoids the races.  Suppose
> the page gets spilled from the to-add-to-lru pagevec and onto the real
> LRU while mark_page_accessed() is concurrently executing. 

Good question. The key here is that __lru_cache_activate_page only
searches the pagevec for the local CPU. If the current CPU is draining the
to_add_to_lru pagevec, it cannot also be simultaneously setting PageActive
in mark_page_accessed. It was discussed in the changelog here.

"Note that only pages on the local pagevec are considered on purpose. A
!PageLRU page could be in the process of being released, reclaimed,
migrated or on a remote pagevec that is currently being drained. Marking
it PageActive is vunerable to races where PageLRU and Active bits are
checked at the wrong time."

Subtle comments on the code belong in the changelog, right?

> We end up
> setting PageActive on a page which is on the inactive LRU?  Maybe this
> is a can't-happen, in which case it's nowhere near clear enough *why*
> this can't happen.
> 

I don't think it can happen. If I'm wrong, PageActive checks in vmscan.c
will trigger.

How about putting this fix on top?

---8<---
mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec -fix

Give the comments a beefier arm.

Signed-off-by: Mel Gorman <mgorman@suse.de>

diff --git a/mm/swap.c b/mm/swap.c
index 5646e31..49eb93f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -439,7 +439,13 @@ static void __lru_cache_activate_page(struct page *page)
 
 	/*
 	 * Search backwards on the optimistic assumption that the page being
-	 * activated has just been added to this pagevec
+	 * activated has just been added to this pagevec. Note that only
+	 * the local pagevec is examined as a !PageLRU page could be in the
+	 * process of being released, reclaimed, migrated or on a remote
+	 * pagevec that is currently being drained. Furthermore, marking
+	 * a remote pagevec's page PageActive potentially hits a race where
+	 * a page is marked PageActive just after it is added to the inactive
+	 * list causing accounting errors and BUG_ON checks to trigger.
 	 */
 	for (i = pagevec_count(pvec) - 1; i >= 0; i--) {
 		struct page *pagevec_page = pvec->pages[i];
@@ -466,9 +472,10 @@ void mark_page_accessed(struct page *page)
 			PageReferenced(page)) {
 
 		/*
-		 * If the page is on the LRU, promote immediately. Otherwise,
-		 * assume the page is on a pagevec, mark it active and it'll
-		 * be moved to the active LRU on the next drain
+		 * If the page is on the LRU, queue it for activation via
+		 * activate_page_pvecs. Otherwise, assume the page is on a
+		 * pagevec, mark it active and it'll be moved to the active
+		 * LRU on the next drain.
 		 */
 		if (PageLRU(page))
 			activate_page(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API
  2013-05-15 22:56     ` Andrew Morton
@ 2013-05-16 14:19       ` Mel Gorman
  -1 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-16 14:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Wed, May 15, 2013 at 03:56:01PM -0700, Andrew Morton wrote:
> On Mon, 13 May 2013 11:21:22 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > Now that the LRU to add a page to is decided at LRU-add time, remove the
> > misleading lru parameter from __pagevec_lru_add. A consequence of this is
> > that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
> > are misleading as the caller no longer has direct control over what LRU
> > the page is added to. Unused helpers are removed by this patch and existing
> > users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
> > directly and use the per-cpu pagevecs instead of creating their own pagevec.
> 
> Well maybe.  The `lru' arg to __lru_cache_add is still there and is
> rather misleading (I find it maddening ;)).  AIUI, it's just there as
> the means by which the __lru_cache_add() caller tells the LRU manager
> that the caller wishes this page to start life on the active LRU, yes? 
> It doesn't _really_ specify an LRU list at all.
> 

Correct. This?

---8<---
mm: Remove lru parameter from __lru_cache_add and lru_cache_add_lru

Similar to __pagevec_lru_add, this patch removes the LRU parameter
from __lru_cache_add and lru_cache_add_lru as the caller does not
control the exact LRU the page gets added to. Instead, require that
the caller set or clear PageActive depending on whether it should
be added to the active or inactive list.

[akpm@linux-foundation.org: Suggested the patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/swap.h | 11 +++++++----
 mm/rmap.c            |  7 ++++---
 mm/swap.c            | 14 ++++----------
 mm/vmscan.c          |  4 +---
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1701ce4..85d7437 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -10,6 +10,7 @@
 #include <linux/node.h>
 #include <linux/fs.h>
 #include <linux/atomic.h>
+#include <linux/page-flags.h>
 #include <asm/page.h>
 
 struct notifier_block;
@@ -233,8 +234,8 @@ extern unsigned long nr_free_pagecache_pages(void);
 
 
 /* linux/mm/swap.c */
-extern void __lru_cache_add(struct page *, enum lru_list lru);
-extern void lru_cache_add_lru(struct page *, enum lru_list lru);
+extern void __lru_cache_add(struct page *);
+extern void lru_cache_add(struct page *);
 extern void lru_add_page_tail(struct page *page, struct page *page_tail,
 			 struct lruvec *lruvec, struct list_head *head);
 extern void activate_page(struct page *);
@@ -254,12 +255,14 @@ extern void add_page_to_unevictable_list(struct page *page);
  */
 static inline void lru_cache_add_anon(struct page *page)
 {
-	__lru_cache_add(page, LRU_INACTIVE_ANON);
+	ClearPageActive(page);
+	__lru_cache_add(page);
 }
 
 static inline void lru_cache_add_file(struct page *page)
 {
-	__lru_cache_add(page, LRU_INACTIVE_FILE);
+	ClearPageActive(page);
+	__lru_cache_add(page);
 }
 
 /* linux/mm/vmscan.c */
diff --git a/mm/rmap.c b/mm/rmap.c
index 6280da8..e22ceeb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1093,9 +1093,10 @@ void page_add_new_anon_rmap(struct page *page,
 	else
 		__inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
 	__page_set_anon_rmap(page, vma, address, 1);
-	if (!mlocked_vma_newpage(vma, page))
-		lru_cache_add_lru(page, LRU_ACTIVE_ANON);
-	else
+	if (!mlocked_vma_newpage(vma, page)) {
+		SetPageActive(page);
+		lru_cache_add(page);
+	} else
 		add_page_to_unevictable_list(page);
 }
 
diff --git a/mm/swap.c b/mm/swap.c
index b8a9000..05944d4 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -496,15 +496,10 @@ EXPORT_SYMBOL(mark_page_accessed);
  * pages that are on the LRU, linear writes in subpage chunks would see
  * every PAGEVEC_SIZE page activated, which is unexpected.
  */
-void __lru_cache_add(struct page *page, enum lru_list lru)
+void __lru_cache_add(struct page *page)
 {
 	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
 
-	if (is_active_lru(lru))
-		SetPageActive(page);
-	else
-		ClearPageActive(page);
-
 	page_cache_get(page);
 	if (!pagevec_space(pvec))
 		__pagevec_lru_add(pvec);
@@ -514,11 +509,10 @@ void __lru_cache_add(struct page *page, enum lru_list lru)
 EXPORT_SYMBOL(__lru_cache_add);
 
 /**
- * lru_cache_add_lru - add a page to a page list
+ * lru_cache_add - add a page to a page list
  * @page: the page to be added to the LRU.
- * @lru: the LRU list to which the page is added.
  */
-void lru_cache_add_lru(struct page *page, enum lru_list lru)
+void lru_cache_add(struct page *page)
 {
 	if (PageActive(page)) {
 		VM_BUG_ON(PageUnevictable(page));
@@ -527,7 +521,7 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
 	}
 
 	VM_BUG_ON(PageLRU(page));
-	__lru_cache_add(page, lru);
+	__lru_cache_add(page);
 }
 
 /**
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa6a853..50088ba 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -546,7 +546,6 @@ int remove_mapping(struct address_space *mapping, struct page *page)
 void putback_lru_page(struct page *page)
 {
 	int lru;
-	int active = !!TestClearPageActive(page);
 	int was_unevictable = PageUnevictable(page);
 
 	VM_BUG_ON(PageLRU(page));
@@ -561,8 +560,7 @@ redo:
 		 * unevictable page on [in]active list.
 		 * We know how to handle that.
 		 */
-		lru = active + page_lru_base_type(page);
-		lru_cache_add_lru(page, lru);
+		lru_cache_add(page);
 	} else {
 		/*
 		 * Put unevictable pages directly on zone's unevictable

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API
@ 2013-05-16 14:19       ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-16 14:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Wed, May 15, 2013 at 03:56:01PM -0700, Andrew Morton wrote:
> On Mon, 13 May 2013 11:21:22 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > Now that the LRU to add a page to is decided at LRU-add time, remove the
> > misleading lru parameter from __pagevec_lru_add. A consequence of this is
> > that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
> > are misleading as the caller no longer has direct control over what LRU
> > the page is added to. Unused helpers are removed by this patch and existing
> > users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
> > directly and use the per-cpu pagevecs instead of creating their own pagevec.
> 
> Well maybe.  The `lru' arg to __lru_cache_add is still there and is
> rather misleading (I find it maddening ;)).  AIUI, it's just there as
> the means by which the __lru_cache_add() caller tells the LRU manager
> that the caller wishes this page to start life on the active LRU, yes? 
> It doesn't _really_ specify an LRU list at all.
> 

Correct. This?

---8<---
mm: Remove lru parameter from __lru_cache_add and lru_cache_add_lru

Similar to __pagevec_lru_add, this patch removes the LRU parameter
from __lru_cache_add and lru_cache_add_lru as the caller does not
control the exact LRU the page gets added to. Instead, require that
the caller set or clear PageActive depending on whether it should
be added to the active or inactive list.

[akpm@linux-foundation.org: Suggested the patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/swap.h | 11 +++++++----
 mm/rmap.c            |  7 ++++---
 mm/swap.c            | 14 ++++----------
 mm/vmscan.c          |  4 +---
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1701ce4..85d7437 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -10,6 +10,7 @@
 #include <linux/node.h>
 #include <linux/fs.h>
 #include <linux/atomic.h>
+#include <linux/page-flags.h>
 #include <asm/page.h>
 
 struct notifier_block;
@@ -233,8 +234,8 @@ extern unsigned long nr_free_pagecache_pages(void);
 
 
 /* linux/mm/swap.c */
-extern void __lru_cache_add(struct page *, enum lru_list lru);
-extern void lru_cache_add_lru(struct page *, enum lru_list lru);
+extern void __lru_cache_add(struct page *);
+extern void lru_cache_add(struct page *);
 extern void lru_add_page_tail(struct page *page, struct page *page_tail,
 			 struct lruvec *lruvec, struct list_head *head);
 extern void activate_page(struct page *);
@@ -254,12 +255,14 @@ extern void add_page_to_unevictable_list(struct page *page);
  */
 static inline void lru_cache_add_anon(struct page *page)
 {
-	__lru_cache_add(page, LRU_INACTIVE_ANON);
+	ClearPageActive(page);
+	__lru_cache_add(page);
 }
 
 static inline void lru_cache_add_file(struct page *page)
 {
-	__lru_cache_add(page, LRU_INACTIVE_FILE);
+	ClearPageActive(page);
+	__lru_cache_add(page);
 }
 
 /* linux/mm/vmscan.c */
diff --git a/mm/rmap.c b/mm/rmap.c
index 6280da8..e22ceeb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1093,9 +1093,10 @@ void page_add_new_anon_rmap(struct page *page,
 	else
 		__inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
 	__page_set_anon_rmap(page, vma, address, 1);
-	if (!mlocked_vma_newpage(vma, page))
-		lru_cache_add_lru(page, LRU_ACTIVE_ANON);
-	else
+	if (!mlocked_vma_newpage(vma, page)) {
+		SetPageActive(page);
+		lru_cache_add(page);
+	} else
 		add_page_to_unevictable_list(page);
 }
 
diff --git a/mm/swap.c b/mm/swap.c
index b8a9000..05944d4 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -496,15 +496,10 @@ EXPORT_SYMBOL(mark_page_accessed);
  * pages that are on the LRU, linear writes in subpage chunks would see
  * every PAGEVEC_SIZE page activated, which is unexpected.
  */
-void __lru_cache_add(struct page *page, enum lru_list lru)
+void __lru_cache_add(struct page *page)
 {
 	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
 
-	if (is_active_lru(lru))
-		SetPageActive(page);
-	else
-		ClearPageActive(page);
-
 	page_cache_get(page);
 	if (!pagevec_space(pvec))
 		__pagevec_lru_add(pvec);
@@ -514,11 +509,10 @@ void __lru_cache_add(struct page *page, enum lru_list lru)
 EXPORT_SYMBOL(__lru_cache_add);
 
 /**
- * lru_cache_add_lru - add a page to a page list
+ * lru_cache_add - add a page to a page list
  * @page: the page to be added to the LRU.
- * @lru: the LRU list to which the page is added.
  */
-void lru_cache_add_lru(struct page *page, enum lru_list lru)
+void lru_cache_add(struct page *page)
 {
 	if (PageActive(page)) {
 		VM_BUG_ON(PageUnevictable(page));
@@ -527,7 +521,7 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
 	}
 
 	VM_BUG_ON(PageLRU(page));
-	__lru_cache_add(page, lru);
+	__lru_cache_add(page);
 }
 
 /**
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa6a853..50088ba 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -546,7 +546,6 @@ int remove_mapping(struct address_space *mapping, struct page *page)
 void putback_lru_page(struct page *page)
 {
 	int lru;
-	int active = !!TestClearPageActive(page);
 	int was_unevictable = PageUnevictable(page);
 
 	VM_BUG_ON(PageLRU(page));
@@ -561,8 +560,7 @@ redo:
 		 * unevictable page on [in]active list.
 		 * We know how to handle that.
 		 */
-		lru = active + page_lru_base_type(page);
-		lru_cache_add_lru(page, lru);
+		lru_cache_add(page);
 	} else {
 		/*
 		 * Put unevictable pages directly on zone's unevictable

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time
  2013-05-15 22:53     ` Andrew Morton
@ 2013-05-16 14:29       ` Mel Gorman
  -1 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-16 14:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Wed, May 15, 2013 at 03:53:30PM -0700, Andrew Morton wrote:
> On Mon, 13 May 2013 11:21:20 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > mark_page_accessed cannot activate an inactive page that is located on
> > an inactive LRU pagevec. Hints from filesystems may be ignored as a
> > result. In preparation for fixing that problem, this patch removes the
> > per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
> > added to is deferred until the pagevec is drained.
> > 
> > This means that fewer pagevecs are available and potentially there is
> > greater contention on the LRU lock. However, this only applies in the case
> > where there is an almost perfect mix of file, anon, active and inactive
> > pages being added to the LRU. In practice I expect that we are adding
> > stream of pages of a particular time and that the changes in contention
> > will barely be measurable.
> > 
> > ...
> >
> > index c612a6a..0911579 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -39,7 +39,7 @@
> >  /* How many pages do we try to swap or page in/out together? */
> >  int page_cluster;
> >  
> > -static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
> > +static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
> >  static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
> >  static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
> >  
> > @@ -460,13 +460,18 @@ EXPORT_SYMBOL(mark_page_accessed);
> >   */
> 
> The comment preceding __lru_cache_add() needs an update.
> 

This?

---8<---
diff --git a/mm/swap.c b/mm/swap.c
index 05944d4..ac23602 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -489,12 +489,10 @@ void mark_page_accessed(struct page *page)
 EXPORT_SYMBOL(mark_page_accessed);
 
 /*
- * Order of operations is important: flush the pagevec when it's already
- * full, not when adding the last page, to make sure that last page is
- * not added to the LRU directly when passed to this function. Because
- * mark_page_accessed() (called after this when writing) only activates
- * pages that are on the LRU, linear writes in subpage chunks would see
- * every PAGEVEC_SIZE page activated, which is unexpected.
+ * Queue the page for addition to the LRU via pagevec. The decision on whether
+ * to add the page to the [in]active [file|anon] list is deferred until the
+ * pagevec is drained. This gives a chance for the caller of __lru_cache_add()
+ * have the page added to the active list using mark_page_accessed().
  */
 void __lru_cache_add(struct page *page)
 {

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time
@ 2013-05-16 14:29       ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2013-05-16 14:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Wed, May 15, 2013 at 03:53:30PM -0700, Andrew Morton wrote:
> On Mon, 13 May 2013 11:21:20 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > mark_page_accessed cannot activate an inactive page that is located on
> > an inactive LRU pagevec. Hints from filesystems may be ignored as a
> > result. In preparation for fixing that problem, this patch removes the
> > per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
> > added to is deferred until the pagevec is drained.
> > 
> > This means that fewer pagevecs are available and potentially there is
> > greater contention on the LRU lock. However, this only applies in the case
> > where there is an almost perfect mix of file, anon, active and inactive
> > pages being added to the LRU. In practice I expect that we are adding
> > stream of pages of a particular time and that the changes in contention
> > will barely be measurable.
> > 
> > ...
> >
> > index c612a6a..0911579 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -39,7 +39,7 @@
> >  /* How many pages do we try to swap or page in/out together? */
> >  int page_cluster;
> >  
> > -static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
> > +static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
> >  static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
> >  static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
> >  
> > @@ -460,13 +460,18 @@ EXPORT_SYMBOL(mark_page_accessed);
> >   */
> 
> The comment preceding __lru_cache_add() needs an update.
> 

This?

---8<---
diff --git a/mm/swap.c b/mm/swap.c
index 05944d4..ac23602 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -489,12 +489,10 @@ void mark_page_accessed(struct page *page)
 EXPORT_SYMBOL(mark_page_accessed);
 
 /*
- * Order of operations is important: flush the pagevec when it's already
- * full, not when adding the last page, to make sure that last page is
- * not added to the LRU directly when passed to this function. Because
- * mark_page_accessed() (called after this when writing) only activates
- * pages that are on the LRU, linear writes in subpage chunks would see
- * every PAGEVEC_SIZE page activated, which is unexpected.
+ * Queue the page for addition to the LRU via pagevec. The decision on whether
+ * to add the page to the [in]active [file|anon] list is deferred until the
+ * pagevec is drained. This gives a chance for the caller of __lru_cache_add()
+ * have the page added to the active list using mark_page_accessed().
  */
 void __lru_cache_add(struct page *page)
 {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
  2013-05-16 13:41       ` Mel Gorman
@ 2013-05-20 22:09         ` Andrew Morton
  -1 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2013-05-20 22:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Thu, 16 May 2013 14:41:04 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Wed, May 15, 2013 at 03:55:00PM -0700, Andrew Morton wrote:
> > > @@ -441,8 +462,17 @@ void activate_page(struct page *page)
> > >  void mark_page_accessed(struct page *page)
> > >  {
> > >  	if (!PageActive(page) && !PageUnevictable(page) &&
> > > -			PageReferenced(page) && PageLRU(page)) {
> > > -		activate_page(page);
> > > +			PageReferenced(page)) {
> > > +
> > > +		/*
> > > +		 * If the page is on the LRU, promote immediately. Otherwise,
> > > +		 * assume the page is on a pagevec, mark it active and it'll
> > > +		 * be moved to the active LRU on the next drain
> > > +		 */
> > > +		if (PageLRU(page))
> > > +			activate_page(page);
> > > +		else
> > > +			__lru_cache_activate_page(page);
> > >  		ClearPageReferenced(page);
> > >  	} else if (!PageReferenced(page)) {
> > >  		SetPageReferenced(page);
> > 
> > For starters, activate_page() doesn't "promote immediately".  It sticks
> > the page into yet another pagevec for deferred activation.
> > 
> 
> True, comment updated.
> 
> > Also, I really worry about the fact that
> > activate_page()->drain->__activate_page() will simply skip over the
> > page if it has PageActive set!  So PageActive does something useful if
> > the page is in the add-to-lru pagevec but nothing useful if the page is
> > in the activate-it-soon pagevec.  This is a confusing, unobvious bug
> > attractant.
> > 
> 
> >From mark_page_accessed, we only call activate_page() for !PageActive
> and PageLRU. The PageLRU is key, if it's set, the pages *must* be on the
> inactive list or they'd trigger BUG_ON(PageActive) checks within
> vmscan.c. Am I missing your point?

I've forgotten what my point was.  I'll ramp back up when looking at
v2.  But this code is at the stage where it needs a state transition
diagram, or table.  Which makes on wonder if it's too damn complex.

Testing PageLRU while not holding lru_lock is always ... interesting.

> ...
>
> > Secondly, I really don't see how this code avoids the races.  Suppose
> > the page gets spilled from the to-add-to-lru pagevec and onto the real
> > LRU while mark_page_accessed() is concurrently executing. 
> 
> Good question. The key here is that __lru_cache_activate_page only
> searches the pagevec for the local CPU. If the current CPU is draining the
> to_add_to_lru pagevec, it cannot also be simultaneously setting PageActive
> in mark_page_accessed. It was discussed in the changelog here.
> 
> "Note that only pages on the local pagevec are considered on purpose. A
> !PageLRU page could be in the process of being released, reclaimed,
> migrated or on a remote pagevec that is currently being drained. Marking
> it PageActive is vunerable to races where PageLRU and Active bits are
> checked at the wrong time."
> 
> Subtle comments on the code belong in the changelog, right?

Not if you want anyone to read them ;)



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec
@ 2013-05-20 22:09         ` Andrew Morton
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2013-05-20 22:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Lyahkov, Andrew Perepechko, Robin Dong, Theodore Tso,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Bernd Schubert,
	David Howells, Trond Myklebust, Linux-fsdevel, Linux-ext4, LKML,
	Linux-mm

On Thu, 16 May 2013 14:41:04 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Wed, May 15, 2013 at 03:55:00PM -0700, Andrew Morton wrote:
> > > @@ -441,8 +462,17 @@ void activate_page(struct page *page)
> > >  void mark_page_accessed(struct page *page)
> > >  {
> > >  	if (!PageActive(page) && !PageUnevictable(page) &&
> > > -			PageReferenced(page) && PageLRU(page)) {
> > > -		activate_page(page);
> > > +			PageReferenced(page)) {
> > > +
> > > +		/*
> > > +		 * If the page is on the LRU, promote immediately. Otherwise,
> > > +		 * assume the page is on a pagevec, mark it active and it'll
> > > +		 * be moved to the active LRU on the next drain
> > > +		 */
> > > +		if (PageLRU(page))
> > > +			activate_page(page);
> > > +		else
> > > +			__lru_cache_activate_page(page);
> > >  		ClearPageReferenced(page);
> > >  	} else if (!PageReferenced(page)) {
> > >  		SetPageReferenced(page);
> > 
> > For starters, activate_page() doesn't "promote immediately".  It sticks
> > the page into yet another pagevec for deferred activation.
> > 
> 
> True, comment updated.
> 
> > Also, I really worry about the fact that
> > activate_page()->drain->__activate_page() will simply skip over the
> > page if it has PageActive set!  So PageActive does something useful if
> > the page is in the add-to-lru pagevec but nothing useful if the page is
> > in the activate-it-soon pagevec.  This is a confusing, unobvious bug
> > attractant.
> > 
> 
> >From mark_page_accessed, we only call activate_page() for !PageActive
> and PageLRU. The PageLRU is key, if it's set, the pages *must* be on the
> inactive list or they'd trigger BUG_ON(PageActive) checks within
> vmscan.c. Am I missing your point?

I've forgotten what my point was.  I'll ramp back up when looking at
v2.  But this code is at the stage where it needs a state transition
diagram, or table.  Which makes on wonder if it's too damn complex.

Testing PageLRU while not holding lru_lock is always ... interesting.

> ...
>
> > Secondly, I really don't see how this code avoids the races.  Suppose
> > the page gets spilled from the to-add-to-lru pagevec and onto the real
> > LRU while mark_page_accessed() is concurrently executing. 
> 
> Good question. The key here is that __lru_cache_activate_page only
> searches the pagevec for the local CPU. If the current CPU is draining the
> to_add_to_lru pagevec, it cannot also be simultaneously setting PageActive
> in mark_page_accessed. It was discussed in the changelog here.
> 
> "Note that only pages on the local pagevec are considered on purpose. A
> !PageLRU page could be in the process of being released, reclaimed,
> migrated or on a remote pagevec that is currently being drained. Marking
> it PageActive is vunerable to races where PageLRU and Active bits are
> checked at the wrong time."
> 
> Subtle comments on the code belong in the changelog, right?

Not if you want anyone to read them ;)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-05-20 22:09 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-13 10:21 [PATCH 0/4] Obey mark_page_accessed hint given by filesystems v2 Mel Gorman
2013-05-13 10:21 ` Mel Gorman
2013-05-13 10:21 ` [PATCH 1/4] mm: Add tracepoints for LRU activation and insertions Mel Gorman
2013-05-13 10:21   ` Mel Gorman
2013-05-15 17:39   ` Rik van Riel
2013-05-15 17:39     ` Rik van Riel
2013-05-13 10:21 ` [PATCH 2/4] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time Mel Gorman
2013-05-13 10:21   ` Mel Gorman
2013-05-15 22:53   ` Andrew Morton
2013-05-15 22:53     ` Andrew Morton
2013-05-16 14:29     ` Mel Gorman
2013-05-16 14:29       ` Mel Gorman
2013-05-13 10:21 ` [PATCH 3/4] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec Mel Gorman
2013-05-13 10:21   ` Mel Gorman
2013-05-15 17:40   ` Rik van Riel
2013-05-15 17:40     ` Rik van Riel
2013-05-15 22:55   ` Andrew Morton
2013-05-15 22:55     ` Andrew Morton
2013-05-16 13:41     ` Mel Gorman
2013-05-16 13:41       ` Mel Gorman
2013-05-20 22:09       ` Andrew Morton
2013-05-20 22:09         ` Andrew Morton
2013-05-13 10:21 ` [PATCH 4/4] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API Mel Gorman
2013-05-13 10:21   ` Mel Gorman
2013-05-15 17:46   ` Rik van Riel
2013-05-15 17:46     ` Rik van Riel
2013-05-15 22:56   ` Andrew Morton
2013-05-15 22:56     ` Andrew Morton
2013-05-16 14:19     ` Mel Gorman
2013-05-16 14:19       ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.