linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc Duponcheel <marc@offline.be>
To: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>,
	Andy Lutomirski <luto@amacapital.net>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Marc Duponcheel <marc@offline.be>
Subject: Re: [3.6 regression?] THP + migration/compaction livelock (I think)
Date: Wed, 14 Nov 2012 14:29:40 +0100	[thread overview]
Message-ID: <20121114132940.GA13196@offline.be> (raw)
In-Reply-To: <20121114100154.GI8218@suse.de>

 Hi all

 If someone can provide the patches (or learn me how to get them with
git (I apologise to not be git savy)) then, this weekend, I can apply
them to 3.6.6 and compare before/after to check if they fix #49361.

 Thanks

On 2012 Nov 14, Mel Gorman wrote:
> On Tue, Nov 13, 2012 at 03:41:02PM -0800, David Rientjes wrote:
> > On Tue, 13 Nov 2012, Andy Lutomirski wrote:
> > 
> > > It just happened again.
> > > 
> > > $ grep -E "compact_|thp_" /proc/vmstat
> > > compact_blocks_moved 8332448774
> > > compact_pages_moved 21831286
> > > compact_pagemigrate_failed 211260
> > > compact_stall 13484
> > > compact_fail 6717
> > > compact_success 6755
> > > thp_fault_alloc 150665
> > > thp_fault_fallback 4270
> > > thp_collapse_alloc 19771
> > > thp_collapse_alloc_failed 2188
> > > thp_split 19600
> > > 
> > 
> > Two of the patches from the list provided at
> > http://marc.info/?l=linux-mm&m=135179005510688 are already in your 3.6.3 
> > kernel:
> > 
> > 	mm: compaction: abort compaction loop if lock is contended or run too long
> > 	mm: compaction: acquire the zone->lock as late as possible
> > 
> > and all have not made it to the 3.6 stable kernel yet, so would it be 
> > possible to try with 3.7-rc5 to see if it fixes the issue?  If so, it will 
> > indicate that the entire series is a candidate to backport to 3.6.
> 
> Thanks David once again.
> 
> The full list of compaction-related patches I believe are necessary for
> this particular problem are
> 
> e64c5237cf6ff474cb2f3f832f48f2b441dd9979 mm: compaction: abort compaction loop if lock is contended or run too long
> 3cc668f4e30fbd97b3c0574d8cac7a83903c9bc7 mm: compaction: move fatal signal check out of compact_checklock_irqsave
> 661c4cb9b829110cb68c18ea05a56be39f75a4d2 mm: compaction: Update try_to_compact_pages()kerneldoc comment
> 2a1402aa044b55c2d30ab0ed9405693ef06fb07c mm: compaction: acquire the zone->lru_lock as late as possible
> f40d1e42bb988d2a26e8e111ea4c4c7bac819b7e mm: compaction: acquire the zone->lock as late as possible
> 753341a4b85ff337487b9959c71c529f522004f4 revert "mm: have order > 0 compaction start off where it left"
> bb13ffeb9f6bfeb301443994dfbf29f91117dfb3 mm: compaction: cache if a pageblock was scanned and no pages were isolated
> c89511ab2f8fe2b47585e60da8af7fd213ec877e mm: compaction: Restart compaction from near where it left off
> 62997027ca5b3d4618198ed8b1aba40b61b1137b mm: compaction: clear PG_migrate_skip based on compaction and reclaim activity
> 0db63d7e25f96e2c6da925c002badf6f144ddf30 mm: compaction: correct the nr_strict va isolated check for CMA
> 
> If we can get confirmation that these fix the problem in 3.6 kernels then
> I can backport them to -stable. This fixing a problem where "many processes
> stall, all in an isolation-related function". This started happening after
> lumpy reclaim was removed because we depended on that to aggressively
> reclaim with less compaction. Now compaction is depended upon more.
> 
> The full 3.7-rc5 kernel has a different problem on top of this and it's
> important the problems do not get conflacted. It has these fixes *but*
> GFP_NO_KSWAPD has been removed and there is a patch that scales reclaim
> with THP failures that is causing problem. With them, kswapd can get
> stuck in a 100% loop where it is neither reclaiming nor reaching its exit
> conditions. The correct fix would be to identify why this happens but I
> have not got around to it yet. To test with 3.7-rc5 then apply either
> 
> 1) https://lkml.org/lkml/2012/11/5/308
> 2) https://lkml.org/lkml/2012/11/12/113
> 
> or
> 
> 1) https://lkml.org/lkml/2012/11/5/308
> 3) https://lkml.org/lkml/2012/11/12/151
> 
> on top of 3.7-rc5. So it's a lot of work but there are three tests I'm
> interested in hearing about. The results of each determine what happens
> in -stable or mainline
> 
> Test 1: 3.6 + the last of commits above	(should fix processes stick in isolate)
> Test 2: 3.7-rc5 + (1+2) above (should fix kswapd stuck at 100%)
> Test 3: 3.7-rc5 + (1+3) above (should fix kswapd stuck at 100% but better)
> 
> Thanks.
> 
> -- 
> Mel Gorman
> SUSE Labs
> 

-- 
--
 Marc Duponcheel
 Velodroomstraat 74 - 2600 Berchem - Belgium
 +32 (0)478 68.10.91 - marc@offline.be

  reply	other threads:[~2012-11-14 13:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-13 22:13 [3.6 regression?] THP + migration/compaction livelock (I think) Andy Lutomirski
2012-11-13 23:11 ` David Rientjes
2012-11-13 23:25   ` Andy Lutomirski
2012-11-13 23:41     ` David Rientjes
2012-11-13 23:45       ` Andy Lutomirski
2012-11-13 23:54         ` David Rientjes
2012-11-14  1:22           ` Marc Duponcheel
2012-11-14  1:51             ` David Rientjes
2012-11-14 13:21               ` Marc Duponcheel
2012-11-14 10:01       ` Mel Gorman
2012-11-14 13:29         ` Marc Duponcheel [this message]
2012-11-14 21:50           ` David Rientjes
2012-11-15  1:14             ` Marc Duponcheel
2012-11-17  0:18               ` Marc Duponcheel
2012-11-18 22:55                 ` David Rientjes
2012-12-05 19:23                   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121114132940.GA13196@offline.be \
    --to=marc@offline.be \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mgorman@suse.de \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).