linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] mm: Don't bother dropping mmap_sem for zero size readahead
@ 2020-02-12 22:16 Minchan Kim
  2020-02-12 22:16 ` [PATCH 2/3] mm: fix long time stall from mm_populate Minchan Kim
  2020-02-12 22:16 ` [PATCH 3/3] mm: make PageReadahead more strict Minchan Kim
  0 siblings, 2 replies; 6+ messages in thread
From: Minchan Kim @ 2020-02-12 22:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Jan Kara, Matthew Wilcox, Josef Bacik,
	Johannes Weiner, Minchan Kim, Robert Stupp

From: Jan Kara <jack@suse.cz>

When handling a page fault, we drop mmap_sem to start async readahead so
that we don't block on IO submission with mmap_sem held.  However
there's no point to drop mmap_sem in case readahead is disabled. Handle
that case to avoid pointless dropping of mmap_sem and retrying the
fault. This was actually reported to block mlockall(MCL_CURRENT)
indefinitely.

Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
Reported-by: Minchan Kim <minchan@kernel.org>
Reported-by: Robert Stupp <snazy@gmx.de>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1784478270e1..5bffaa2176cd 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2416,7 +2416,7 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
 	pgoff_t offset = vmf->pgoff;
 
 	/* If we don't want any read-ahead, don't bother */
-	if (vmf->vma->vm_flags & VM_RAND_READ)
+	if (vmf->vma->vm_flags & VM_RAND_READ || !ra->ra_pages)
 		return fpin;
 	if (ra->mmap_miss > 0)
 		ra->mmap_miss--;
-- 
2.25.0.225.g125e21ebc7-goog



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] mm: fix long time stall from mm_populate
  2020-02-12 22:16 [PATCH 1/3] mm: Don't bother dropping mmap_sem for zero size readahead Minchan Kim
@ 2020-02-12 22:16 ` Minchan Kim
  2020-04-21  4:12   ` Andrew Morton
  2020-02-12 22:16 ` [PATCH 3/3] mm: make PageReadahead more strict Minchan Kim
  1 sibling, 1 reply; 6+ messages in thread
From: Minchan Kim @ 2020-02-12 22:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Jan Kara, Matthew Wilcox, Josef Bacik,
	Johannes Weiner, Minchan Kim, Robert Stupp

Basically, fault handler releases mmap_sem before requesting readahead
and then it is supposed to retry lookup the page from page cache with
FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry.

However, what happens if the fault handler find a page from page
cache and the page has readahead marker but are waiting under
writeback? Plus one more condition, it happens under mm_populate
which repeats faulting unless it encounters error. So let's assemble
conditions below.

__mm_populate
for (...)
  get_user_pages(faluty_address)
    handle_mm_fault
      filemap_fault
        find a page form page(PG_uptodate|PG_readahead|PG_writeback)
	it will return VM_FAULT_RETRY
  continue with faulty_address

IOW, it will repeat fault retry logic until the page will be written
back in the long run. It makes big spike latency of several seconds.

This patch solves the issue by turning off fault retry logic in second
trial.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/gup.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 1b521e0ac1de..b3f825092abf 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1196,6 +1196,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
 	struct vm_area_struct *vma = NULL;
 	int locked = 0;
 	long ret = 0;
+	bool tried = false;
 
 	end = start + len;
 
@@ -1226,14 +1227,18 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
 		 * double checks the vma flags, so that it won't mlock pages
 		 * if the vma was already munlocked.
 		 */
-		ret = populate_vma_page_range(vma, nstart, nend, &locked);
+		ret = populate_vma_page_range(vma, nstart, nend,
+						tried ? NULL : &locked);
 		if (ret < 0) {
 			if (ignore_errors) {
 				ret = 0;
 				continue;	/* continue at next VMA */
 			}
 			break;
-		}
+		} else if (ret == 0)
+			tried = true;
+		else
+			tried = false;
 		nend = nstart + ret * PAGE_SIZE;
 		ret = 0;
 	}
-- 
2.25.0.225.g125e21ebc7-goog



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] mm: make PageReadahead more strict
  2020-02-12 22:16 [PATCH 1/3] mm: Don't bother dropping mmap_sem for zero size readahead Minchan Kim
  2020-02-12 22:16 ` [PATCH 2/3] mm: fix long time stall from mm_populate Minchan Kim
@ 2020-02-12 22:16 ` Minchan Kim
  2020-02-17  9:31   ` Jan Kara
  1 sibling, 1 reply; 6+ messages in thread
From: Minchan Kim @ 2020-02-12 22:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Jan Kara, Matthew Wilcox, Josef Bacik,
	Johannes Weiner, Minchan Kim, Robert Stupp

PG_readahead flag is shared with PG_reclaim but PG_reclaim is only
used in write context while PG_readahead is used for read context.

To make it clear, let's introduce PageReadahead wrapper with
!PageWriteback so it could make code clear and we could drop
PageWriteback check in page_cache_async_readahead, which removes
pointless dropping mmap_sem.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/page-flags.h | 28 ++++++++++++++++++++++++++--
 mm/readahead.c             |  6 ------
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1bf83c8fcaa7..f91a9b2a49bd 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -363,8 +363,32 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
 PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
 	TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
-PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
-	TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
+
+SETPAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
+CLEARPAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
+
+/*
+ * Since PG_readahead is shared with PG_reclaim of the page flags,
+ * PageReadahead should double check whether it's readahead marker
+ * or PG_reclaim. It could be done by PageWriteback check because
+ * PG_reclaim is always with PG_writeback.
+ */
+static inline int PageReadahead(struct page *page)
+{
+	VM_BUG_ON_PGFLAGS(PageCompound(page), page);
+
+	return (page->flags & (1UL << PG_reclaim | 1UL << PG_writeback)) ==
+		(1UL << PG_reclaim);
+}
+
+/* Clear PG_readahead only if it's PG_readahead, not PG_reclaim */
+static inline int TestClearPageReadahead(struct page *page)
+{
+	VM_BUG_ON_PGFLAGS(PageCompound(page), page);
+
+	return !PageWriteback(page) ||
+			test_and_clear_bit(PG_reclaim, &page->flags);
+}
 
 #ifdef CONFIG_HIGHMEM
 /*
diff --git a/mm/readahead.c b/mm/readahead.c
index 2fe72cd29b47..85b15e5a1d7b 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -553,12 +553,6 @@ page_cache_async_readahead(struct address_space *mapping,
 	if (!ra->ra_pages)
 		return;
 
-	/*
-	 * Same bit is used for PG_readahead and PG_reclaim.
-	 */
-	if (PageWriteback(page))
-		return;
-
 	ClearPageReadahead(page);
 
 	/*
-- 
2.25.0.225.g125e21ebc7-goog



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] mm: make PageReadahead more strict
  2020-02-12 22:16 ` [PATCH 3/3] mm: make PageReadahead more strict Minchan Kim
@ 2020-02-17  9:31   ` Jan Kara
  2020-02-18 22:28     ` Minchan Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2020-02-17  9:31 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Jan Kara, Matthew Wilcox,
	Josef Bacik, Johannes Weiner, Robert Stupp

On Wed 12-02-20 14:16:14, Minchan Kim wrote:
> PG_readahead flag is shared with PG_reclaim but PG_reclaim is only
> used in write context while PG_readahead is used for read context.
> 
> To make it clear, let's introduce PageReadahead wrapper with
> !PageWriteback so it could make code clear and we could drop
> PageWriteback check in page_cache_async_readahead, which removes
> pointless dropping mmap_sem.
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>

...

> +/* Clear PG_readahead only if it's PG_readahead, not PG_reclaim */
> +static inline int TestClearPageReadahead(struct page *page)
> +{
> +	VM_BUG_ON_PGFLAGS(PageCompound(page), page);
> +
> +	return !PageWriteback(page) ||
> +			test_and_clear_bit(PG_reclaim, &page->flags);
> +}

I think this is still wrong - if PageWriteback is not set, it will never
clear PG_reclaim bit so effectively the page will stay in PageReadahead
state!

The logic you really want to implement is:

	if (PageReadahead(page)) { <- this is your new PageReadahead
				    implementation
		clear_bit(PG_reclaim, &page->flags);
		return 1;
	}
	return 0;

Now this has the problem that it is not atomic. The only way I see to make
this fully atomic is using cmpxchg(). If we wanted to make this kinda-sorta
OK, the proper condition would look like:

	return !PageWriteback(page) **&&**
			test_and_clear_bit(PG_reclaim, &page->flags);

Which is similar to what you originally had but different because in C '&&'
operator is not commutative due to side-effects committed at sequence points.

BTW: I share Andrew's view that we are piling hacks to fix problems caused
by older hacks. But I don't see any good option how to unalias
PG_readahead and PG_reclaim.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] mm: make PageReadahead more strict
  2020-02-17  9:31   ` Jan Kara
@ 2020-02-18 22:28     ` Minchan Kim
  0 siblings, 0 replies; 6+ messages in thread
From: Minchan Kim @ 2020-02-18 22:28 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, linux-mm, LKML, Matthew Wilcox, Josef Bacik,
	Johannes Weiner, Robert Stupp

On Mon, Feb 17, 2020 at 10:31:28AM +0100, Jan Kara wrote:
> On Wed 12-02-20 14:16:14, Minchan Kim wrote:
> > PG_readahead flag is shared with PG_reclaim but PG_reclaim is only
> > used in write context while PG_readahead is used for read context.
> > 
> > To make it clear, let's introduce PageReadahead wrapper with
> > !PageWriteback so it could make code clear and we could drop
> > PageWriteback check in page_cache_async_readahead, which removes
> > pointless dropping mmap_sem.
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> ...
> 
> > +/* Clear PG_readahead only if it's PG_readahead, not PG_reclaim */
> > +static inline int TestClearPageReadahead(struct page *page)
> > +{
> > +	VM_BUG_ON_PGFLAGS(PageCompound(page), page);
> > +
> > +	return !PageWriteback(page) ||
> > +			test_and_clear_bit(PG_reclaim, &page->flags);
> > +}
> 
> I think this is still wrong - if PageWriteback is not set, it will never
> clear PG_reclaim bit so effectively the page will stay in PageReadahead
> state!
> 
> The logic you really want to implement is:
> 
> 	if (PageReadahead(page)) { <- this is your new PageReadahead
> 				    implementation
> 		clear_bit(PG_reclaim, &page->flags);
> 		return 1;
> 	}
> 	return 0;
> 
> Now this has the problem that it is not atomic. The only way I see to make
> this fully atomic is using cmpxchg(). If we wanted to make this kinda-sorta
> OK, the proper condition would look like:
> 
> 	return !PageWriteback(page) **&&**
> 			test_and_clear_bit(PG_reclaim, &page->flags);
> 
> Which is similar to what you originally had but different because in C '&&'
> operator is not commutative due to side-effects committed at sequence points.

It's accurate. Thanks, Jan.

> 
> BTW: I share Andrew's view that we are piling hacks to fix problems caused
> by older hacks. But I don't see any good option how to unalias
> PG_readahead and PG_reclaim.

I believe it's okay to remove PG_writeback check in page_cache_async_readahead. 
It's merely optmization to accelerate page reclaim when the LRU victim page's
writeback is done by VM. If we removes the condition, we just lose few pages at
the moment for fast reclaim but finally they could be reclaimed in inactive LRU
and it would be *rare* in that that kinds of writeback from reclaim context
should be not common and doesn't affect system behavior a lot.


> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] mm: fix long time stall from mm_populate
  2020-02-12 22:16 ` [PATCH 2/3] mm: fix long time stall from mm_populate Minchan Kim
@ 2020-04-21  4:12   ` Andrew Morton
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2020-04-21  4:12 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, LKML, Jan Kara, Matthew Wilcox, Josef Bacik,
	Johannes Weiner, Robert Stupp

On Wed, 12 Feb 2020 14:16:13 -0800 Minchan Kim <minchan@kernel.org> wrote:

> Basically, fault handler releases mmap_sem before requesting readahead
> and then it is supposed to retry lookup the page from page cache with
> FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry.
> 
> However, what happens if the fault handler find a page from page
> cache and the page has readahead marker but are waiting under
> writeback? Plus one more condition, it happens under mm_populate
> which repeats faulting unless it encounters error. So let's assemble
> conditions below.
> 
> __mm_populate
> for (...)
>   get_user_pages(faluty_address)
>     handle_mm_fault
>       filemap_fault
>         find a page form page(PG_uptodate|PG_readahead|PG_writeback)
> 	it will return VM_FAULT_RETRY
>   continue with faulty_address
> 
> IOW, it will repeat fault retry logic until the page will be written
> back in the long run. It makes big spike latency of several seconds.
> 
> This patch solves the issue by turning off fault retry logic in second
> trial.

I guess a resend would be helpful, to refresh memories.

Please mention in the changelog whether this "long time stall" has been
observed in practice.  If so, under what conditions, how often and how
long was it?  etcetera.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-04-21  4:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 22:16 [PATCH 1/3] mm: Don't bother dropping mmap_sem for zero size readahead Minchan Kim
2020-02-12 22:16 ` [PATCH 2/3] mm: fix long time stall from mm_populate Minchan Kim
2020-04-21  4:12   ` Andrew Morton
2020-02-12 22:16 ` [PATCH 3/3] mm: make PageReadahead more strict Minchan Kim
2020-02-17  9:31   ` Jan Kara
2020-02-18 22:28     ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).