linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
@ 2016-12-10 17:26 Johannes Weiner
  2016-12-12  9:21 ` Vlastimil Babka
  2016-12-12  9:51 ` [PATCH] " Mel Gorman
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Weiner @ 2016-12-10 17:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, linux-mm, linux-kernel, kernel-team

When FADV_DONTNEED cannot drop all pages in the range, it observes
that some pages might still be on per-cpu LRU caches after recent
instantiation and so initiates remote calls to all CPUs to flush their
local caches. However, in most cases, the fadvise happens from the
same context that instantiated the pages, and any pre-LRU pages in the
specified range are most likely sitting on the local CPU's LRU cache,
and so in many cases this results in unnecessary remote calls, which,
in a loaded system, can hold up the fadvise() call significantly.

Try to avoid the remote call by flushing the local LRU cache before
even attempting to invalidate anything. It's a cheap operation, and
the local LRU cache is the most likely to hold any pre-LRU pages in
the specified fadvise range.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/fadvise.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index 6c707bfe02fd..a43013112581 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -139,7 +139,20 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
 		}
 
 		if (end_index >= start_index) {
-			unsigned long count = invalidate_mapping_pages(mapping,
+			unsigned long count;
+
+			/*
+			 * It's common to FADV_DONTNEED right after
+			 * the read or write that instantiates the
+			 * pages, in which case there will be some
+			 * sitting on the local LRU cache. Try to
+			 * avoid the expensive remote drain and the
+			 * second cache tree walk below by flushing
+			 * them out right away.
+			 */
+			lru_add_drain();
+
+			count = invalidate_mapping_pages(mapping,
 						start_index, end_index);
 
 			/*
-- 
2.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
  2016-12-10 17:26 [PATCH] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED Johannes Weiner
@ 2016-12-12  9:21 ` Vlastimil Babka
  2016-12-12 15:55   ` Johannes Weiner
  2016-12-12  9:51 ` [PATCH] " Mel Gorman
  1 sibling, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2016-12-12  9:21 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Mel Gorman, linux-mm, linux-kernel, kernel-team

On 12/10/2016 06:26 PM, Johannes Weiner wrote:
> When FADV_DONTNEED cannot drop all pages in the range, it observes
> that some pages might still be on per-cpu LRU caches after recent
> instantiation and so initiates remote calls to all CPUs to flush their
> local caches. However, in most cases, the fadvise happens from the
> same context that instantiated the pages, and any pre-LRU pages in the
> specified range are most likely sitting on the local CPU's LRU cache,
> and so in many cases this results in unnecessary remote calls, which,
> in a loaded system, can hold up the fadvise() call significantly.

Got any numbers for this part?

> Try to avoid the remote call by flushing the local LRU cache before
> even attempting to invalidate anything. It's a cheap operation, and
> the local LRU cache is the most likely to hold any pre-LRU pages in
> the specified fadvise range.

Anyway it looks like things can't be worse after this patch, so...

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/fadvise.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 6c707bfe02fd..a43013112581 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -139,7 +139,20 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
>  		}
>
>  		if (end_index >= start_index) {
> -			unsigned long count = invalidate_mapping_pages(mapping,
> +			unsigned long count;
> +
> +			/*
> +			 * It's common to FADV_DONTNEED right after
> +			 * the read or write that instantiates the
> +			 * pages, in which case there will be some
> +			 * sitting on the local LRU cache. Try to
> +			 * avoid the expensive remote drain and the
> +			 * second cache tree walk below by flushing
> +			 * them out right away.
> +			 */
> +			lru_add_drain();
> +
> +			count = invalidate_mapping_pages(mapping,
>  						start_index, end_index);
>
>  			/*
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
  2016-12-10 17:26 [PATCH] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED Johannes Weiner
  2016-12-12  9:21 ` Vlastimil Babka
@ 2016-12-12  9:51 ` Mel Gorman
  1 sibling, 0 replies; 7+ messages in thread
From: Mel Gorman @ 2016-12-12  9:51 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team

On Sat, Dec 10, 2016 at 12:26:58PM -0500, Johannes Weiner wrote:
> When FADV_DONTNEED cannot drop all pages in the range, it observes
> that some pages might still be on per-cpu LRU caches after recent
> instantiation and so initiates remote calls to all CPUs to flush their
> local caches. However, in most cases, the fadvise happens from the
> same context that instantiated the pages, and any pre-LRU pages in the
> specified range are most likely sitting on the local CPU's LRU cache,
> and so in many cases this results in unnecessary remote calls, which,
> in a loaded system, can hold up the fadvise() call significantly.
> 
> Try to avoid the remote call by flushing the local LRU cache before
> even attempting to invalidate anything. It's a cheap operation, and
> the local LRU cache is the most likely to hold any pre-LRU pages in
> the specified fadvise range.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
  2016-12-12  9:21 ` Vlastimil Babka
@ 2016-12-12 15:55   ` Johannes Weiner
  2016-12-13 12:32     ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2016-12-12 15:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, kernel-team

On Mon, Dec 12, 2016 at 10:21:24AM +0100, Vlastimil Babka wrote:
> On 12/10/2016 06:26 PM, Johannes Weiner wrote:
> > When FADV_DONTNEED cannot drop all pages in the range, it observes
> > that some pages might still be on per-cpu LRU caches after recent
> > instantiation and so initiates remote calls to all CPUs to flush their
> > local caches. However, in most cases, the fadvise happens from the
> > same context that instantiated the pages, and any pre-LRU pages in the
> > specified range are most likely sitting on the local CPU's LRU cache,
> > and so in many cases this results in unnecessary remote calls, which,
> > in a loaded system, can hold up the fadvise() call significantly.
> 
> Got any numbers for this part?

I didn't record it in the extreme case we observed, unfortunately. We
had a slow-to-respond system and noticed it spending seconds in
lru_add_drain_all() after fadvise calls, and this patch came out of
thinking about the code and how we commonly call FADV_DONTNEED.

FWIW, I wrote a silly directory tree walker/searcher that recurses
through /usr to read and FADV_DONTNEED each file it finds. On a 2
socket 40 ht machine, over 1% is spent in lru_add_drain_all(). With
the patch, that cost is gone; the local drain cost shows at 0.09%.

> > Try to avoid the remote call by flushing the local LRU cache before
> > even attempting to invalidate anything. It's a cheap operation, and
> > the local LRU cache is the most likely to hold any pre-LRU pages in
> > the specified fadvise range.
> 
> Anyway it looks like things can't be worse after this patch, so...
> 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
  2016-12-12 15:55   ` Johannes Weiner
@ 2016-12-13 12:32     ` Vlastimil Babka
  2016-12-14 21:00       ` [PATCH v2] " Johannes Weiner
  0 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2016-12-13 12:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, kernel-team

On 12/12/2016 04:55 PM, Johannes Weiner wrote:
> On Mon, Dec 12, 2016 at 10:21:24AM +0100, Vlastimil Babka wrote:
>> On 12/10/2016 06:26 PM, Johannes Weiner wrote:
>>> When FADV_DONTNEED cannot drop all pages in the range, it observes
>>> that some pages might still be on per-cpu LRU caches after recent
>>> instantiation and so initiates remote calls to all CPUs to flush their
>>> local caches. However, in most cases, the fadvise happens from the
>>> same context that instantiated the pages, and any pre-LRU pages in the
>>> specified range are most likely sitting on the local CPU's LRU cache,
>>> and so in many cases this results in unnecessary remote calls, which,
>>> in a loaded system, can hold up the fadvise() call significantly.
>>
>> Got any numbers for this part?
>
> I didn't record it in the extreme case we observed, unfortunately. We
> had a slow-to-respond system and noticed it spending seconds in
> lru_add_drain_all() after fadvise calls, and this patch came out of
> thinking about the code and how we commonly call FADV_DONTNEED.
>
> FWIW, I wrote a silly directory tree walker/searcher that recurses
> through /usr to read and FADV_DONTNEED each file it finds. On a 2
> socket 40 ht machine, over 1% is spent in lru_add_drain_all(). With
> the patch, that cost is gone; the local drain cost shows at 0.09%.

Thanks, worth adding to changelog :)

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
  2016-12-13 12:32     ` Vlastimil Babka
@ 2016-12-14 21:00       ` Johannes Weiner
  2016-12-15  4:09         ` Hillf Danton
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2016-12-14 21:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, kernel-team

When FADV_DONTNEED cannot drop all pages in the range, it observes
that some pages might still be on per-cpu LRU caches after recent
instantiation and so initiates remote calls to all CPUs to flush their
local caches. However, in most cases, the fadvise happens from the
same context that instantiated the pages, and any pre-LRU pages in the
specified range are most likely sitting on the local CPU's LRU cache,
and so in many cases this results in unnecessary remote calls, which,
in a loaded system, can hold up the fadvise() call significantly.

[ I didn't record it in the extreme case we observed at Facebook,
  unfortunately. We had a slow-to-respond system and noticed it
  lru_add_drain_all() leading the profile during fadvise calls. This
  patch came out of thinking about the code and how we commonly call
  FADV_DONTNEED.

  FWIW, I wrote a silly directory tree walker/searcher that recurses
  through /usr to read and FADV_DONTNEED each file it finds. On a 2
  socket 40 ht machine, over 1% is spent in lru_add_drain_all(). With
  the patch, that cost is gone; the local drain cost shows at 0.09%. ]

Try to avoid the remote call by flushing the local LRU cache before
even attempting to invalidate anything. It's a cheap operation, and
the local LRU cache is the most likely to hold any pre-LRU pages in
the specified fadvise range.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
---
 mm/fadvise.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index 6c707bfe02fd..a43013112581 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -139,7 +139,20 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
 		}
 
 		if (end_index >= start_index) {
-			unsigned long count = invalidate_mapping_pages(mapping,
+			unsigned long count;
+
+			/*
+			 * It's common to FADV_DONTNEED right after
+			 * the read or write that instantiates the
+			 * pages, in which case there will be some
+			 * sitting on the local LRU cache. Try to
+			 * avoid the expensive remote drain and the
+			 * second cache tree walk below by flushing
+			 * them out right away.
+			 */
+			lru_add_drain();
+
+			count = invalidate_mapping_pages(mapping,
 						start_index, end_index);
 
 			/*
-- 
2.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
  2016-12-14 21:00       ` [PATCH v2] " Johannes Weiner
@ 2016-12-15  4:09         ` Hillf Danton
  0 siblings, 0 replies; 7+ messages in thread
From: Hillf Danton @ 2016-12-15  4:09 UTC (permalink / raw)
  To: 'Johannes Weiner', 'Vlastimil Babka'
  Cc: 'Andrew Morton', 'Mel Gorman',
	linux-mm, linux-kernel, kernel-team

On Thursday, December 15, 2016 5:00 AM Johannes Weiner wrote: 
> When FADV_DONTNEED cannot drop all pages in the range, it observes
> that some pages might still be on per-cpu LRU caches after recent
> instantiation and so initiates remote calls to all CPUs to flush their
> local caches. However, in most cases, the fadvise happens from the
> same context that instantiated the pages, and any pre-LRU pages in the
> specified range are most likely sitting on the local CPU's LRU cache,
> and so in many cases this results in unnecessary remote calls, which,
> in a loaded system, can hold up the fadvise() call significantly.
> 
> [ I didn't record it in the extreme case we observed at Facebook,
>   unfortunately. We had a slow-to-respond system and noticed it
>   lru_add_drain_all() leading the profile during fadvise calls. This
>   patch came out of thinking about the code and how we commonly call
>   FADV_DONTNEED.
> 
>   FWIW, I wrote a silly directory tree walker/searcher that recurses
>   through /usr to read and FADV_DONTNEED each file it finds. On a 2
>   socket 40 ht machine, over 1% is spent in lru_add_drain_all(). With
>   the patch, that cost is gone; the local drain cost shows at 0.09%. ]
> 
> Try to avoid the remote call by flushing the local LRU cache before
> even attempting to invalidate anything. It's a cheap operation, and
> the local LRU cache is the most likely to hold any pre-LRU pages in
> the specified fadvise range.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Mel Gorman <mgorman@suse.de>
> ---
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>

>  mm/fadvise.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 6c707bfe02fd..a43013112581 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -139,7 +139,20 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
>  		}
> 
>  		if (end_index >= start_index) {
> -			unsigned long count = invalidate_mapping_pages(mapping,
> +			unsigned long count;
> +
> +			/*
> +			 * It's common to FADV_DONTNEED right after
> +			 * the read or write that instantiates the
> +			 * pages, in which case there will be some
> +			 * sitting on the local LRU cache. Try to
> +			 * avoid the expensive remote drain and the
> +			 * second cache tree walk below by flushing
> +			 * them out right away.
> +			 */
> +			lru_add_drain();
> +
> +			count = invalidate_mapping_pages(mapping,
>  						start_index, end_index);
> 
>  			/*
> --
> 2.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-12-15  4:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-10 17:26 [PATCH] mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED Johannes Weiner
2016-12-12  9:21 ` Vlastimil Babka
2016-12-12 15:55   ` Johannes Weiner
2016-12-13 12:32     ` Vlastimil Babka
2016-12-14 21:00       ` [PATCH v2] " Johannes Weiner
2016-12-15  4:09         ` Hillf Danton
2016-12-12  9:51 ` [PATCH] " Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).