All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
@ 2020-09-23 13:33 Yafang Shao
  2020-09-25 14:40 ` Mel Gorman
  0 siblings, 1 reply; 4+ messages in thread
From: Yafang Shao @ 2020-09-23 13:33 UTC (permalink / raw)
  To: mgorman, hannes, mhocko, akpm; +Cc: linux-mm, Yafang Shao

Our users reported that there're some random latency spikes when their RT
process is running. Finally we found that latency spike is caused by
FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
remote CPUs, and then waits the per-cpu work to complete. The wait time
is uncertain, which may be tens millisecond.
That behavior is unreasonable, because this process is bound to a
specific CPU and the file is only accessed by itself, IOW, there should
be no pagecache pages on a per-cpu pagevec of a remote CPU. That
unreasonable behavior is partially caused by the wrong comparation of the
number of invalidated pages and the number of the target. For example,
        if (count < (end_index - start_index + 1))
The count above is how many pages were invalidated in the local CPU, and
(end_index - start_index + 1) is how many pages should be invalidated.
The usage of (end_index - start_index + 1) is incorrect, because they
are virtual addresses, which may not mapped to pages. Besides that,
there may be holes between start and end. So we'd better check whether
there are still pages on per-cpu pagevec after drain the local cpu, and
then decide whether or not to call lru_add_drain_all().

After I applied it with a hotfix to our production environment, most of
the lru_add_drain_all() can be avoided.

Suggested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/fs.h |  4 ++++
 mm/fadvise.c       |  9 +++----
 mm/truncate.c      | 58 ++++++++++++++++++++++++++++++++--------------
 3 files changed, 49 insertions(+), 22 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a08..6ba747b097c5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2591,6 +2591,10 @@ extern bool is_bad_inode(struct inode *);
 unsigned long invalidate_mapping_pages(struct address_space *mapping,
 					pgoff_t start, pgoff_t end);
 
+void invalidate_mapping_pagevec(struct address_space *mapping,
+				pgoff_t start, pgoff_t end,
+				unsigned long *nr_pagevec);
+
 static inline void invalidate_remote_inode(struct inode *inode)
 {
 	if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 0e66f2aaeea3..d6baa4f451c5 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -141,7 +141,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 		}
 
 		if (end_index >= start_index) {
-			unsigned long count;
+			unsigned long nr_pagevec = 0;
 
 			/*
 			 * It's common to FADV_DONTNEED right after
@@ -154,8 +154,9 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 			 */
 			lru_add_drain();
 
-			count = invalidate_mapping_pages(mapping,
-						start_index, end_index);
+			invalidate_mapping_pagevec(mapping,
+						start_index, end_index,
+						&nr_pagevec);
 
 			/*
 			 * If fewer pages were invalidated than expected then
@@ -163,7 +164,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 			 * a per-cpu pagevec for a remote CPU. Drain all
 			 * pagevecs and try again.
 			 */
-			if (count < (end_index - start_index + 1)) {
+			if (nr_pagevec) {
 				lru_add_drain_all();
 				invalidate_mapping_pages(mapping, start_index,
 						end_index);
diff --git a/mm/truncate.c b/mm/truncate.c
index dd9ebc1da356..6bbe0f0b3ce9 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -528,23 +528,8 @@ void truncate_inode_pages_final(struct address_space *mapping)
 }
 EXPORT_SYMBOL(truncate_inode_pages_final);
 
-/**
- * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
- * @mapping: the address_space which holds the pages to invalidate
- * @start: the offset 'from' which to invalidate
- * @end: the offset 'to' which to invalidate (inclusive)
- *
- * This function only removes the unlocked pages, if you want to
- * remove all the pages of one inode, you must call truncate_inode_pages.
- *
- * invalidate_mapping_pages() will not block on IO activity. It will not
- * invalidate pages which are dirty, locked, under writeback or mapped into
- * pagetables.
- *
- * Return: the number of the pages that were invalidated
- */
-unsigned long invalidate_mapping_pages(struct address_space *mapping,
-		pgoff_t start, pgoff_t end)
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+		pgoff_t start, pgoff_t end, unsigned long *nr_pagevec)
 {
 	pgoff_t indices[PAGEVEC_SIZE];
 	struct pagevec pvec;
@@ -610,8 +595,13 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			 * Invalidation is a hint that the page is no longer
 			 * of interest and try to speed up its reclaim.
 			 */
-			if (!ret)
+			if (!ret) {
 				deactivate_file_page(page);
+				/* It is likely on the pagevec of a remote CPU */
+				if (nr_pagevec)
+					(*nr_pagevec)++;
+			}
+
 			if (PageTransHuge(page))
 				put_page(page);
 			count += ret;
@@ -623,8 +613,40 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 	}
 	return count;
 }
+
+/**
+ * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
+ * @mapping: the address_space which holds the pages to invalidate
+ * @start: the offset 'from' which to invalidate
+ * @end: the offset 'to' which to invalidate (inclusive)
+ *
+ * This function only removes the unlocked pages, if you want to
+ * remove all the pages of one inode, you must call truncate_inode_pages.
+ *
+ * invalidate_mapping_pages() will not block on IO activity. It will not
+ * invalidate pages which are dirty, locked, under writeback or mapped into
+ * pagetables.
+ *
+ * Return: the number of the pages that were invalidated
+ */
+unsigned long invalidate_mapping_pages(struct address_space *mapping,
+		pgoff_t start, pgoff_t end)
+{
+	return __invalidate_mapping_pages(mapping, start, end, NULL);
+}
 EXPORT_SYMBOL(invalidate_mapping_pages);
 
+/**
+ * This helper is similar with the above one, except that it accounts for pages
+ * that are likely on a pagevec and count them in @nr_pagevec, which will used by
+ * the caller.
+ */
+void invalidate_mapping_pagevec(struct address_space *mapping,
+		pgoff_t start, pgoff_t end, unsigned long *nr_pagevec)
+{
+	__invalidate_mapping_pages(mapping, start, end, nr_pagevec);
+}
+
 /*
  * This is like invalidate_complete_page(), except it ignores the page's
  * refcount.  We do this because invalidate_inode_pages2() needs stronger
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
  2020-09-23 13:33 [PATCH v2] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED Yafang Shao
@ 2020-09-25 14:40 ` Mel Gorman
  2020-09-27  4:22   ` Yafang Shao
  0 siblings, 1 reply; 4+ messages in thread
From: Mel Gorman @ 2020-09-25 14:40 UTC (permalink / raw)
  To: Yafang Shao; +Cc: hannes, mhocko, akpm, linux-mm

On Wed, Sep 23, 2020 at 09:33:18PM +0800, Yafang Shao wrote:
> Our users reported that there're some random latency spikes when their RT
> process is running. Finally we found that latency spike is caused by
> FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> remote CPUs, and then waits the per-cpu work to complete. The wait time
> is uncertain, which may be tens millisecond.
> That behavior is unreasonable, because this process is bound to a
> specific CPU and the file is only accessed by itself, IOW, there should
> be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> unreasonable behavior is partially caused by the wrong comparation of the
> number of invalidated pages and the number of the target. For example,
>         if (count < (end_index - start_index + 1))
> The count above is how many pages were invalidated in the local CPU, and
> (end_index - start_index + 1) is how many pages should be invalidated.
> The usage of (end_index - start_index + 1) is incorrect, because they
> are virtual addresses, which may not mapped to pages. Besides that,
> there may be holes between start and end. So we'd better check whether
> there are still pages on per-cpu pagevec after drain the local cpu, and
> then decide whether or not to call lru_add_drain_all().
> 
> After I applied it with a hotfix to our production environment, most of
> the lru_add_drain_all() can be avoided.
> 
> Suggested-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Johannes Weiner <hannes@cmpxchg.org>

I think that's ok. Does it succeed with the original test case from the
commit that introduced the behaviour and one modified to truncate part
of the mapping?

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
  2020-09-25 14:40 ` Mel Gorman
@ 2020-09-27  4:22   ` Yafang Shao
  2020-09-28 10:22     ` Mel Gorman
  0 siblings, 1 reply; 4+ messages in thread
From: Yafang Shao @ 2020-09-27  4:22 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Johannes Weiner, Michal Hocko, Andrew Morton, Linux MM

On Fri, Sep 25, 2020 at 10:40 PM Mel Gorman <mgorman@suse.de> wrote:
>
> On Wed, Sep 23, 2020 at 09:33:18PM +0800, Yafang Shao wrote:
> > Our users reported that there're some random latency spikes when their RT
> > process is running. Finally we found that latency spike is caused by
> > FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> > remote CPUs, and then waits the per-cpu work to complete. The wait time
> > is uncertain, which may be tens millisecond.
> > That behavior is unreasonable, because this process is bound to a
> > specific CPU and the file is only accessed by itself, IOW, there should
> > be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> > unreasonable behavior is partially caused by the wrong comparation of the
> > number of invalidated pages and the number of the target. For example,
> >         if (count < (end_index - start_index + 1))
> > The count above is how many pages were invalidated in the local CPU, and
> > (end_index - start_index + 1) is how many pages should be invalidated.
> > The usage of (end_index - start_index + 1) is incorrect, because they
> > are virtual addresses, which may not mapped to pages. Besides that,
> > there may be holes between start and end. So we'd better check whether
> > there are still pages on per-cpu pagevec after drain the local cpu, and
> > then decide whether or not to call lru_add_drain_all().
> >
> > After I applied it with a hotfix to our production environment, most of
> > the lru_add_drain_all() can be avoided.
> >
> > Suggested-by: Mel Gorman <mgorman@suse.de>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
>
> I think that's ok. Does it succeed with the original test case from the
> commit that introduced the behaviour and one modified to truncate part
> of the mapping?
>

Yes, I verified the test case in commit 67d46b296a1b and then modified
it with truncate.
Both works fine.

-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
  2020-09-27  4:22   ` Yafang Shao
@ 2020-09-28 10:22     ` Mel Gorman
  0 siblings, 0 replies; 4+ messages in thread
From: Mel Gorman @ 2020-09-28 10:22 UTC (permalink / raw)
  To: Yafang Shao; +Cc: Johannes Weiner, Michal Hocko, Andrew Morton, Linux MM

On Sun, Sep 27, 2020 at 12:22:16PM +0800, Yafang Shao wrote:
> On Fri, Sep 25, 2020 at 10:40 PM Mel Gorman <mgorman@suse.de> wrote:
> >
> > On Wed, Sep 23, 2020 at 09:33:18PM +0800, Yafang Shao wrote:
> > > Our users reported that there're some random latency spikes when their RT
> > > process is running. Finally we found that latency spike is caused by
> > > FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> > > remote CPUs, and then waits the per-cpu work to complete. The wait time
> > > is uncertain, which may be tens millisecond.
> > > That behavior is unreasonable, because this process is bound to a
> > > specific CPU and the file is only accessed by itself, IOW, there should
> > > be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> > > unreasonable behavior is partially caused by the wrong comparation of the
> > > number of invalidated pages and the number of the target. For example,
> > >         if (count < (end_index - start_index + 1))
> > > The count above is how many pages were invalidated in the local CPU, and
> > > (end_index - start_index + 1) is how many pages should be invalidated.
> > > The usage of (end_index - start_index + 1) is incorrect, because they
> > > are virtual addresses, which may not mapped to pages. Besides that,
> > > there may be holes between start and end. So we'd better check whether
> > > there are still pages on per-cpu pagevec after drain the local cpu, and
> > > then decide whether or not to call lru_add_drain_all().
> > >
> > > After I applied it with a hotfix to our production environment, most of
> > > the lru_add_drain_all() can be avoided.
> > >
> > > Suggested-by: Mel Gorman <mgorman@suse.de>
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> >
> > I think that's ok. Does it succeed with the original test case from the
> > commit that introduced the behaviour and one modified to truncate part
> > of the mapping?
> >
> 
> Yes, I verified the test case in commit 67d46b296a1b and then modified
> it with truncate.
> Both works fine.
> 

In that case

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-28 10:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 13:33 [PATCH v2] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED Yafang Shao
2020-09-25 14:40 ` Mel Gorman
2020-09-27  4:22   ` Yafang Shao
2020-09-28 10:22     ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.