linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@intel.com>
To: 'Mel Gorman' <mgorman@techsingularity.net>,
	'Linus Torvalds' <torvalds@linux-foundation.org>
Cc: 'Mel Gorman' <mgorman@suse.de>,
	"'Kirill A. Shutemov'" <kirill.shutemov@linux.intel.com>,
	'Tim Chen' <tim.c.chen@linux.intel.com>,
	'Peter Zijlstra' <peterz@infradead.org>,
	'Ingo Molnar' <mingo@elte.hu>, 'Andi Kleen' <ak@linux.intel.com>,
	'Andrew Morton' <akpm@linux-foundation.org>,
	'Johannes Weiner' <hannes@cmpxchg.org>, 'Jan Kara' <jack@suse.cz>,
	'linux-mm' <linux-mm@kvack.org>,
	'Linux Kernel Mailing List' <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH 1/2] sched/wait: Break up long wake list walk
Date: Tue, 22 Aug 2017 17:23:47 +0000	[thread overview]
Message-ID: <37D7C6CF3E00A74B8858931C1DB2F0775378A24A@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F07753788B58@SHSMSX103.ccr.corp.intel.com>


> > Covering both paths would be something like the patch below which
> > spins until the page is unlocked or it should reschedule. It's not
> > even boot tested as I spent what time I had on the test case that I
> > hoped would be able to prove it really works.
> 
> I will give it a try.

Although the patch doesn't trigger watchdog, the spin lock wait time
is not small (0.45s).
It may get worse again on larger systems.


Irqsoff ftrace result.
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 4.13.0-rc4+
# --------------------------------------------------------------------
# latency: 451753 us, #4/4, CPU#159 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:224)
#    -----------------
#    | task: fjsctest-233851 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: wake_up_page_bit
#  => ended at:   wake_up_page_bit
#
#
#                  _------=> CPU#            
#                 / _-----=> irqs-off        
#                | / _----=> need-resched    
#                || / _---=> hardirq/softirq 
#                ||| / _--=> preempt-depth   
#                |||| /     delay            
#  cmd     pid   ||||| time  |   caller      
#     \   /      |||||  \    |   /         
   <...>-233851 159d...    0us@: _raw_spin_lock_irqsave <-wake_up_page_bit
   <...>-233851 159dN.. 451726us+: _raw_spin_unlock_irqrestore <-wake_up_page_bit
   <...>-233851 159dN.. 451754us!: trace_hardirqs_on <-wake_up_page_bit
   <...>-233851 159dN.. 451873us : <stack trace>
 => unlock_page
 => migrate_pages
 => migrate_misplaced_page
 => __handle_mm_fault
 => handle_mm_fault
 => __do_page_fault
 => do_page_fault
 => page_fault


The call stack of wait_on_page_bit_common

   100.00%  (ffffffff971b252b)
            |
            ---__spinwait_on_page_locked
               |          
               |--96.81%--__migration_entry_wait
               |          migration_entry_wait
               |          do_swap_page
               |          __handle_mm_fault
               |          handle_mm_fault
               |          __do_page_fault
               |          do_page_fault
               |          page_fault
               |          |          
               |          |--22.49%--0x123a2
               |          |          |          
               |          |           --22.34%--start_thread
               |          |          
               |          |--15.69%--0x127bc
               |          |          |          
               |          |           --13.20%--start_thread
               |          |          
               |          |--13.48%--0x12352
               |          |          |          
               |          |           --11.74%--start_thread
               |          |          
               |          |--13.43%--0x127f2
               |          |          |          
               |          |           --11.25%--start_thread
               |          |          
               |          |--10.03%--0x1285e
               |          |          |          
               |          |           --8.59%--start_thread
               |          |          
               |          |--5.90%--0x12894
               |          |          |          
               |          |           --5.03%--start_thread
               |          |          
               |          |--5.66%--0x12828
               |          |          |          
               |          |           --4.81%--start_thread
               |          |          
               |          |--5.17%--0x1233c
               |          |          |          
               |          |           --4.46%--start_thread
               |          |          
               |           --4.72%--0x2b788
               |                     |          
               |                      --4.72%--0x127a2
               |                                start_thread
               |          
                --3.19%--do_huge_pmd_numa_page
                          __handle_mm_fault
                          handle_mm_fault
                          __do_page_fault
                          do_page_fault
                          page_fault
                          0x2b788
                          0x127a2
                          start_thread


> 
> >
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index
> > 79b36f57c3ba..31cda1288176 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -517,6 +517,13 @@ static inline void wait_on_page_locked(struct
> > page
> > *page)
> >  		wait_on_page_bit(compound_head(page), PG_locked);  }
> >
> > +void __spinwait_on_page_locked(struct page *page); static inline void
> > +spinwait_on_page_locked(struct page *page) {
> > +	if (PageLocked(page))
> > +		__spinwait_on_page_locked(page);
> > +}
> > +
> >  static inline int wait_on_page_locked_killable(struct page *page)  {
> >  	if (!PageLocked(page))
> > diff --git a/mm/filemap.c b/mm/filemap.c index
> > a49702445ce0..c9d6f49614bc 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -1210,6 +1210,15 @@ int __lock_page_or_retry(struct page *page,
> > struct mm_struct *mm,
> >  	}
> >  }
> >
> > +void __spinwait_on_page_locked(struct page *page) {
> > +	do {
> > +		cpu_relax();
> > +	} while (PageLocked(page) && !cond_resched());
> > +
> > +	wait_on_page_locked(page);
> > +}
> > +
> >  /**
> >   * page_cache_next_hole - find the next hole (not-present entry)
> >   * @mapping: mapping
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c index
> > 90731e3b7e58..c7025c806420 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1443,7 +1443,7 @@ int do_huge_pmd_numa_page(struct vm_fault
> *vmf,
> > pmd_t pmd)
> >  		if (!get_page_unless_zero(page))
> >  			goto out_unlock;
> >  		spin_unlock(vmf->ptl);
> > -		wait_on_page_locked(page);
> > +		spinwait_on_page_locked(page);
> >  		put_page(page);
> >  		goto out;
> >  	}
> > @@ -1480,7 +1480,7 @@ int do_huge_pmd_numa_page(struct vm_fault
> *vmf,
> > pmd_t pmd)
> >  		if (!get_page_unless_zero(page))
> >  			goto out_unlock;
> >  		spin_unlock(vmf->ptl);
> > -		wait_on_page_locked(page);
> > +		spinwait_on_page_locked(page);
> >  		put_page(page);
> >  		goto out;
> >  	}
> > diff --git a/mm/migrate.c b/mm/migrate.c index
> > e84eeb4e4356..9b6c3fc5beac 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -308,7 +308,7 @@ void __migration_entry_wait(struct mm_struct
> *mm,
> > pte_t *ptep,
> >  	if (!get_page_unless_zero(page))
> >  		goto out;
> >  	pte_unmap_unlock(ptep, ptl);
> > -	wait_on_page_locked(page);
> > +	spinwait_on_page_locked(page);
> >  	put_page(page);
> >  	return;
> >  out:
> >

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-08-22 17:23 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15  0:52 [PATCH 1/2] sched/wait: Break up long wake list walk Tim Chen
2017-08-15  0:52 ` [PATCH 2/2] sched/wait: Introduce lock breaker in wake_up_page_bit Tim Chen
2017-08-15  1:48 ` [PATCH 1/2] sched/wait: Break up long wake list walk Linus Torvalds
2017-08-15  2:27   ` Andi Kleen
2017-08-15  2:52     ` Linus Torvalds
2017-08-15  3:15       ` Andi Kleen
2017-08-15  3:28         ` Linus Torvalds
2017-08-15 19:05           ` Tim Chen
2017-08-15 19:41             ` Linus Torvalds
2017-08-15 19:47               ` Linus Torvalds
2017-08-15 22:47           ` Davidlohr Bueso
2017-08-15 22:56             ` Linus Torvalds
2017-08-15 22:57               ` Linus Torvalds
2017-08-15 23:50                 ` Linus Torvalds
2017-08-16 23:22                   ` Eric W. Biederman
2017-08-17 16:17   ` Liang, Kan
2017-08-17 16:25     ` Linus Torvalds
2017-08-17 20:18       ` Liang, Kan
2017-08-17 20:44         ` Linus Torvalds
2017-08-18 12:23           ` Mel Gorman
2017-08-18 14:20             ` Liang, Kan
2017-08-18 14:46               ` Mel Gorman
2017-08-18 16:36                 ` Tim Chen
2017-08-18 16:45                   ` Andi Kleen
2017-08-18 16:53                 ` Liang, Kan
2017-08-18 17:48                   ` Linus Torvalds
2017-08-18 18:54                     ` Mel Gorman
2017-08-18 19:14                       ` Linus Torvalds
2017-08-18 19:58                         ` Andi Kleen
2017-08-18 20:10                           ` Linus Torvalds
2017-08-21 18:32                         ` Mel Gorman
2017-08-21 18:56                           ` Liang, Kan
2017-08-22 17:23                             ` Liang, Kan [this message]
2017-08-22 18:19                               ` Linus Torvalds
2017-08-22 18:25                                 ` Linus Torvalds
2017-08-22 18:56                                 ` Peter Zijlstra
2017-08-22 19:15                                   ` Linus Torvalds
2017-08-22 19:08                                 ` Peter Zijlstra
2017-08-22 19:30                                   ` Linus Torvalds
2017-08-22 19:37                                     ` Andi Kleen
2017-08-22 21:08                                       ` Christopher Lameter
2017-08-22 21:24                                         ` Andi Kleen
2017-08-22 22:52                                           ` Linus Torvalds
2017-08-22 23:19                                             ` Linus Torvalds
2017-08-23 14:51                                             ` Liang, Kan
2017-08-22 19:55                                 ` Liang, Kan
2017-08-22 20:42                                   ` Linus Torvalds
2017-08-22 20:53                                     ` Peter Zijlstra
2017-08-22 20:58                                       ` Linus Torvalds
2017-08-23 14:49                                     ` Liang, Kan
2017-08-23 15:58                                       ` Tim Chen
2017-08-23 18:17                                         ` Linus Torvalds
2017-08-23 20:55                                           ` Liang, Kan
2017-08-23 23:30                                           ` Linus Torvalds
2017-08-24 17:49                                             ` Tim Chen
2017-08-24 18:16                                               ` Linus Torvalds
2017-08-24 20:44                                                 ` Mel Gorman
2017-08-25 16:44                                                   ` Tim Chen
2017-08-23 16:04                                 ` Mel Gorman
2017-08-18 20:05                     ` Andi Kleen
2017-08-18 20:29                       ` Linus Torvalds
2017-08-18 20:29                     ` Liang, Kan
2017-08-18 20:34                       ` Linus Torvalds
2017-08-18 16:55             ` Linus Torvalds
2017-08-18 13:06           ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37D7C6CF3E00A74B8858931C1DB2F0775378A24A@SHSMSX103.ccr.corp.intel.com \
    --to=kan.liang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).