From: "Liang, Kan" <kan.liang@intel.com>
To: 'Mel Gorman' <mgorman@techsingularity.net>,
'Linus Torvalds' <torvalds@linux-foundation.org>
Cc: 'Mel Gorman' <mgorman@suse.de>,
"'Kirill A. Shutemov'" <kirill.shutemov@linux.intel.com>,
'Tim Chen' <tim.c.chen@linux.intel.com>,
'Peter Zijlstra' <peterz@infradead.org>,
'Ingo Molnar' <mingo@elte.hu>, 'Andi Kleen' <ak@linux.intel.com>,
'Andrew Morton' <akpm@linux-foundation.org>,
'Johannes Weiner' <hannes@cmpxchg.org>, 'Jan Kara' <jack@suse.cz>,
'linux-mm' <linux-mm@kvack.org>,
'Linux Kernel Mailing List' <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH 1/2] sched/wait: Break up long wake list walk
Date: Tue, 22 Aug 2017 17:23:47 +0000 [thread overview]
Message-ID: <37D7C6CF3E00A74B8858931C1DB2F0775378A24A@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F07753788B58@SHSMSX103.ccr.corp.intel.com>
> > Covering both paths would be something like the patch below which
> > spins until the page is unlocked or it should reschedule. It's not
> > even boot tested as I spent what time I had on the test case that I
> > hoped would be able to prove it really works.
>
> I will give it a try.
Although the patch doesn't trigger watchdog, the spin lock wait time
is not small (0.45s).
It may get worse again on larger systems.
Irqsoff ftrace result.
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 4.13.0-rc4+
# --------------------------------------------------------------------
# latency: 451753 us, #4/4, CPU#159 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:224)
# -----------------
# | task: fjsctest-233851 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: wake_up_page_bit
# => ended at: wake_up_page_bit
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
<...>-233851 159d... 0us@: _raw_spin_lock_irqsave <-wake_up_page_bit
<...>-233851 159dN.. 451726us+: _raw_spin_unlock_irqrestore <-wake_up_page_bit
<...>-233851 159dN.. 451754us!: trace_hardirqs_on <-wake_up_page_bit
<...>-233851 159dN.. 451873us : <stack trace>
=> unlock_page
=> migrate_pages
=> migrate_misplaced_page
=> __handle_mm_fault
=> handle_mm_fault
=> __do_page_fault
=> do_page_fault
=> page_fault
The call stack of wait_on_page_bit_common
100.00% (ffffffff971b252b)
|
---__spinwait_on_page_locked
|
|--96.81%--__migration_entry_wait
| migration_entry_wait
| do_swap_page
| __handle_mm_fault
| handle_mm_fault
| __do_page_fault
| do_page_fault
| page_fault
| |
| |--22.49%--0x123a2
| | |
| | --22.34%--start_thread
| |
| |--15.69%--0x127bc
| | |
| | --13.20%--start_thread
| |
| |--13.48%--0x12352
| | |
| | --11.74%--start_thread
| |
| |--13.43%--0x127f2
| | |
| | --11.25%--start_thread
| |
| |--10.03%--0x1285e
| | |
| | --8.59%--start_thread
| |
| |--5.90%--0x12894
| | |
| | --5.03%--start_thread
| |
| |--5.66%--0x12828
| | |
| | --4.81%--start_thread
| |
| |--5.17%--0x1233c
| | |
| | --4.46%--start_thread
| |
| --4.72%--0x2b788
| |
| --4.72%--0x127a2
| start_thread
|
--3.19%--do_huge_pmd_numa_page
__handle_mm_fault
handle_mm_fault
__do_page_fault
do_page_fault
page_fault
0x2b788
0x127a2
start_thread
>
> >
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index
> > 79b36f57c3ba..31cda1288176 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -517,6 +517,13 @@ static inline void wait_on_page_locked(struct
> > page
> > *page)
> > wait_on_page_bit(compound_head(page), PG_locked); }
> >
> > +void __spinwait_on_page_locked(struct page *page); static inline void
> > +spinwait_on_page_locked(struct page *page) {
> > + if (PageLocked(page))
> > + __spinwait_on_page_locked(page);
> > +}
> > +
> > static inline int wait_on_page_locked_killable(struct page *page) {
> > if (!PageLocked(page))
> > diff --git a/mm/filemap.c b/mm/filemap.c index
> > a49702445ce0..c9d6f49614bc 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -1210,6 +1210,15 @@ int __lock_page_or_retry(struct page *page,
> > struct mm_struct *mm,
> > }
> > }
> >
> > +void __spinwait_on_page_locked(struct page *page) {
> > + do {
> > + cpu_relax();
> > + } while (PageLocked(page) && !cond_resched());
> > +
> > + wait_on_page_locked(page);
> > +}
> > +
> > /**
> > * page_cache_next_hole - find the next hole (not-present entry)
> > * @mapping: mapping
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c index
> > 90731e3b7e58..c7025c806420 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1443,7 +1443,7 @@ int do_huge_pmd_numa_page(struct vm_fault
> *vmf,
> > pmd_t pmd)
> > if (!get_page_unless_zero(page))
> > goto out_unlock;
> > spin_unlock(vmf->ptl);
> > - wait_on_page_locked(page);
> > + spinwait_on_page_locked(page);
> > put_page(page);
> > goto out;
> > }
> > @@ -1480,7 +1480,7 @@ int do_huge_pmd_numa_page(struct vm_fault
> *vmf,
> > pmd_t pmd)
> > if (!get_page_unless_zero(page))
> > goto out_unlock;
> > spin_unlock(vmf->ptl);
> > - wait_on_page_locked(page);
> > + spinwait_on_page_locked(page);
> > put_page(page);
> > goto out;
> > }
> > diff --git a/mm/migrate.c b/mm/migrate.c index
> > e84eeb4e4356..9b6c3fc5beac 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -308,7 +308,7 @@ void __migration_entry_wait(struct mm_struct
> *mm,
> > pte_t *ptep,
> > if (!get_page_unless_zero(page))
> > goto out;
> > pte_unmap_unlock(ptep, ptl);
> > - wait_on_page_locked(page);
> > + spinwait_on_page_locked(page);
> > put_page(page);
> > return;
> > out:
> >
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-08-22 17:23 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-15 0:52 [PATCH 1/2] sched/wait: Break up long wake list walk Tim Chen
2017-08-15 0:52 ` [PATCH 2/2] sched/wait: Introduce lock breaker in wake_up_page_bit Tim Chen
2017-08-15 1:48 ` [PATCH 1/2] sched/wait: Break up long wake list walk Linus Torvalds
2017-08-15 2:27 ` Andi Kleen
2017-08-15 2:52 ` Linus Torvalds
2017-08-15 3:15 ` Andi Kleen
2017-08-15 3:28 ` Linus Torvalds
2017-08-15 19:05 ` Tim Chen
2017-08-15 19:41 ` Linus Torvalds
2017-08-15 19:47 ` Linus Torvalds
2017-08-15 22:47 ` Davidlohr Bueso
2017-08-15 22:56 ` Linus Torvalds
2017-08-15 22:57 ` Linus Torvalds
2017-08-15 23:50 ` Linus Torvalds
2017-08-16 23:22 ` Eric W. Biederman
2017-08-17 16:17 ` Liang, Kan
2017-08-17 16:25 ` Linus Torvalds
2017-08-17 20:18 ` Liang, Kan
2017-08-17 20:44 ` Linus Torvalds
2017-08-18 12:23 ` Mel Gorman
2017-08-18 14:20 ` Liang, Kan
2017-08-18 14:46 ` Mel Gorman
2017-08-18 16:36 ` Tim Chen
2017-08-18 16:45 ` Andi Kleen
2017-08-18 16:53 ` Liang, Kan
2017-08-18 17:48 ` Linus Torvalds
2017-08-18 18:54 ` Mel Gorman
2017-08-18 19:14 ` Linus Torvalds
2017-08-18 19:58 ` Andi Kleen
2017-08-18 20:10 ` Linus Torvalds
2017-08-21 18:32 ` Mel Gorman
2017-08-21 18:56 ` Liang, Kan
2017-08-22 17:23 ` Liang, Kan [this message]
2017-08-22 18:19 ` Linus Torvalds
2017-08-22 18:25 ` Linus Torvalds
2017-08-22 18:56 ` Peter Zijlstra
2017-08-22 19:15 ` Linus Torvalds
2017-08-22 19:08 ` Peter Zijlstra
2017-08-22 19:30 ` Linus Torvalds
2017-08-22 19:37 ` Andi Kleen
2017-08-22 21:08 ` Christopher Lameter
2017-08-22 21:24 ` Andi Kleen
2017-08-22 22:52 ` Linus Torvalds
2017-08-22 23:19 ` Linus Torvalds
2017-08-23 14:51 ` Liang, Kan
2017-08-22 19:55 ` Liang, Kan
2017-08-22 20:42 ` Linus Torvalds
2017-08-22 20:53 ` Peter Zijlstra
2017-08-22 20:58 ` Linus Torvalds
2017-08-23 14:49 ` Liang, Kan
2017-08-23 15:58 ` Tim Chen
2017-08-23 18:17 ` Linus Torvalds
2017-08-23 20:55 ` Liang, Kan
2017-08-23 23:30 ` Linus Torvalds
2017-08-24 17:49 ` Tim Chen
2017-08-24 18:16 ` Linus Torvalds
2017-08-24 20:44 ` Mel Gorman
2017-08-25 16:44 ` Tim Chen
2017-08-23 16:04 ` Mel Gorman
2017-08-18 20:05 ` Andi Kleen
2017-08-18 20:29 ` Linus Torvalds
2017-08-18 20:29 ` Liang, Kan
2017-08-18 20:34 ` Linus Torvalds
2017-08-18 16:55 ` Linus Torvalds
2017-08-18 13:06 ` Liang, Kan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=37D7C6CF3E00A74B8858931C1DB2F0775378A24A@SHSMSX103.ccr.corp.intel.com \
--to=kan.liang@intel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mgorman@techsingularity.net \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).