linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Liang, Kan" <kan.liang@intel.com>, Mel Gorman <mgorman@suse.de>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>, Andi Kleen <ak@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk
Date: Mon, 21 Aug 2017 19:32:34 +0100	[thread overview]
Message-ID: <20170821183234.kzennaaw2zt2rbwz@techsingularity.net> (raw)
In-Reply-To: <CA+55aFwX0yrUPULrDxTWVCg5c6DKh-yCG84NXVxaptXNQ4O_kA@mail.gmail.com>

On Fri, Aug 18, 2017 at 12:14:12PM -0700, Linus Torvalds wrote:
> On Fri, Aug 18, 2017 at 11:54 AM, Mel Gorman
> <mgorman@techsingularity.net> wrote:
> >
> > One option to mitigate (but not eliminate) the problem is to record when
> > the page lock is contended and pass in TNF_PAGE_CONTENDED (new flag) to
> > task_numa_fault().
> 
> Well, finding it contended is fairly easy - just look at the page wait
> queue, and if it's not empty, assume it's due to contention.
> 

Yes.

> I also wonder if we could be even *more* hacky, and in the whole
> __migration_entry_wait() path, change the logic from:
> 
>  - wait on page lock before retrying the fault
> 
> to
> 
>  - yield()
> 
> which is hacky, but there's a rationale for it:
> 
>  (a) avoid the crazy long wait queues ;)
> 
>  (b) we know that migration is *supposed* to be CPU-bound (not IO
> bound), so yielding the CPU and retrying may just be the right thing
> to do.
> 

Potentially. I spent a few hours trying to construct a test case that
would migrate constantly that could be used as a basis for evaluating a
patch or alternative. Unfortunately it was not as easy as I thought and
I still have to construct a case that causes migration storms that would
result in multiple threads waiting on a single page.

> Because that code sequence doesn't actually depend on
> "wait_on_page_lock()" for _correctness_ anyway, afaik. Anybody who
> does "migration_entry_wait()" _has_ to retry anyway, since the page
> table contents may have changed by waiting.
> 
> So I'm not proud of the attached patch, and I don't think it's really
> acceptable as-is, but maybe it's worth testing? And maybe it's
> arguably no worse than what we have now?
> 
> Comments?
> 

The transhuge migration path for numa balancing doesn't go through the
migration_entry_wait patch despite similarly named functions that suggest
it does so this may only has the most effect when THP is disabled. It's
worth trying anyway.

Covering both paths would be something like the patch below which spins
until the page is unlocked or it should reschedule. It's not even boot
tested as I spent what time I had on the test case that I hoped would be
able to prove it really works.

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 79b36f57c3ba..31cda1288176 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -517,6 +517,13 @@ static inline void wait_on_page_locked(struct page *page)
 		wait_on_page_bit(compound_head(page), PG_locked);
 }
 
+void __spinwait_on_page_locked(struct page *page);
+static inline void spinwait_on_page_locked(struct page *page)
+{
+	if (PageLocked(page))
+		__spinwait_on_page_locked(page);
+}
+
 static inline int wait_on_page_locked_killable(struct page *page)
 {
 	if (!PageLocked(page))
diff --git a/mm/filemap.c b/mm/filemap.c
index a49702445ce0..c9d6f49614bc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1210,6 +1210,15 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 	}
 }
 
+void __spinwait_on_page_locked(struct page *page)
+{
+	do {
+		cpu_relax();
+	} while (PageLocked(page) && !cond_resched());
+
+	wait_on_page_locked(page);
+}
+
 /**
  * page_cache_next_hole - find the next hole (not-present entry)
  * @mapping: mapping
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 90731e3b7e58..c7025c806420 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1443,7 +1443,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 		if (!get_page_unless_zero(page))
 			goto out_unlock;
 		spin_unlock(vmf->ptl);
-		wait_on_page_locked(page);
+		spinwait_on_page_locked(page);
 		put_page(page);
 		goto out;
 	}
@@ -1480,7 +1480,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 		if (!get_page_unless_zero(page))
 			goto out_unlock;
 		spin_unlock(vmf->ptl);
-		wait_on_page_locked(page);
+		spinwait_on_page_locked(page);
 		put_page(page);
 		goto out;
 	}
diff --git a/mm/migrate.c b/mm/migrate.c
index e84eeb4e4356..9b6c3fc5beac 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -308,7 +308,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
 	if (!get_page_unless_zero(page))
 		goto out;
 	pte_unmap_unlock(ptep, ptl);
-	wait_on_page_locked(page);
+	spinwait_on_page_locked(page);
 	put_page(page);
 	return;
 out:


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-08-21 18:32 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15  0:52 [PATCH 1/2] sched/wait: Break up long wake list walk Tim Chen
2017-08-15  0:52 ` [PATCH 2/2] sched/wait: Introduce lock breaker in wake_up_page_bit Tim Chen
2017-08-15  1:48 ` [PATCH 1/2] sched/wait: Break up long wake list walk Linus Torvalds
2017-08-15  2:27   ` Andi Kleen
2017-08-15  2:52     ` Linus Torvalds
2017-08-15  3:15       ` Andi Kleen
2017-08-15  3:28         ` Linus Torvalds
2017-08-15 19:05           ` Tim Chen
2017-08-15 19:41             ` Linus Torvalds
2017-08-15 19:47               ` Linus Torvalds
2017-08-15 22:47           ` Davidlohr Bueso
2017-08-15 22:56             ` Linus Torvalds
2017-08-15 22:57               ` Linus Torvalds
2017-08-15 23:50                 ` Linus Torvalds
2017-08-16 23:22                   ` Eric W. Biederman
2017-08-17 16:17   ` Liang, Kan
2017-08-17 16:25     ` Linus Torvalds
2017-08-17 20:18       ` Liang, Kan
2017-08-17 20:44         ` Linus Torvalds
2017-08-18 12:23           ` Mel Gorman
2017-08-18 14:20             ` Liang, Kan
2017-08-18 14:46               ` Mel Gorman
2017-08-18 16:36                 ` Tim Chen
2017-08-18 16:45                   ` Andi Kleen
2017-08-18 16:53                 ` Liang, Kan
2017-08-18 17:48                   ` Linus Torvalds
2017-08-18 18:54                     ` Mel Gorman
2017-08-18 19:14                       ` Linus Torvalds
2017-08-18 19:58                         ` Andi Kleen
2017-08-18 20:10                           ` Linus Torvalds
2017-08-21 18:32                         ` Mel Gorman [this message]
2017-08-21 18:56                           ` Liang, Kan
2017-08-22 17:23                             ` Liang, Kan
2017-08-22 18:19                               ` Linus Torvalds
2017-08-22 18:25                                 ` Linus Torvalds
2017-08-22 18:56                                 ` Peter Zijlstra
2017-08-22 19:15                                   ` Linus Torvalds
2017-08-22 19:08                                 ` Peter Zijlstra
2017-08-22 19:30                                   ` Linus Torvalds
2017-08-22 19:37                                     ` Andi Kleen
2017-08-22 21:08                                       ` Christopher Lameter
2017-08-22 21:24                                         ` Andi Kleen
2017-08-22 22:52                                           ` Linus Torvalds
2017-08-22 23:19                                             ` Linus Torvalds
2017-08-23 14:51                                             ` Liang, Kan
2017-08-22 19:55                                 ` Liang, Kan
2017-08-22 20:42                                   ` Linus Torvalds
2017-08-22 20:53                                     ` Peter Zijlstra
2017-08-22 20:58                                       ` Linus Torvalds
2017-08-23 14:49                                     ` Liang, Kan
2017-08-23 15:58                                       ` Tim Chen
2017-08-23 18:17                                         ` Linus Torvalds
2017-08-23 20:55                                           ` Liang, Kan
2017-08-23 23:30                                           ` Linus Torvalds
2017-08-24 17:49                                             ` Tim Chen
2017-08-24 18:16                                               ` Linus Torvalds
2017-08-24 20:44                                                 ` Mel Gorman
2017-08-25 16:44                                                   ` Tim Chen
2017-08-23 16:04                                 ` Mel Gorman
2017-08-18 20:05                     ` Andi Kleen
2017-08-18 20:29                       ` Linus Torvalds
2017-08-18 20:29                     ` Liang, Kan
2017-08-18 20:34                       ` Linus Torvalds
2017-08-18 16:55             ` Linus Torvalds
2017-08-18 13:06           ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170821183234.kzennaaw2zt2rbwz@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kan.liang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).