From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752481AbbCSXbC (ORCPT ); Thu, 19 Mar 2015 19:31:02 -0400 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:49343 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751786AbbCSXa7 (ORCPT ); Thu, 19 Mar 2015 19:30:59 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AjEACbWwtV/wYQLHlcgwaBLLMWBpkGAgIBAQKBP00BAQEBAQF9hA8BAQEDAScTHCMFCwgDGAklDwUlAyETiCcHzjsBAQgCAR8YhXKFDYQPEQFQB4QtBZZeg1qUKSKCAhyBZCoxgQuBOAEBAQ Date: Fri, 20 Mar 2015 10:23:26 +1100 From: Dave Chinner To: Linus Torvalds Cc: Mel Gorman , Ingo Molnar , Andrew Morton , Aneesh Kumar , Linux Kernel Mailing List , Linux-MM , xfs@oss.sgi.com, ppc-dev Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Message-ID: <20150319232326.GM10105@dastard> References: <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 19, 2015 at 04:05:46PM -0700, Linus Torvalds wrote: > On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner wrote: > > > > My recollection wasn't faulty - I pulled it from an earlier email. > > That said, the original measurement might have been faulty. I ran > > the numbers again on the 3.19 kernel I saved away from the original > > testing. That came up at 235k, which is pretty much the same as > > yesterday's test. The runtime,however, is unchanged from my original > > measurements of 4m54s (pte_hack came in at 5m20s). > > Ok. Good. So the "more than an order of magnitude difference" was > really about measurement differences, not quite as real. Looks like > more a "factor of two" than a factor of 20. > > Did you do the profiles the same way? Because that would explain the > differences in the TLB flush percentages too (the "1.4% from > tlb_invalidate_range()" vs "pretty much everything from migration"). No, the profiles all came from steady state. The profiles from the initial startup phase hammer the mmap_sem because of page fault vs mprotect contention (glibc runs mprotect() on every chunk of memory it allocates). It's not until the cache reaches "full" and it starts recycling old buffers rather than allocating new ones that the tlb flush problem dominates the profiles. > The runtime variation does show that there's some *big* subtle > difference for the numa balancing in the exact TNF_NO_GROUP details. > It must be *very* unstable for it to make that big of a difference. > But I feel at least a *bit* better about "unstable algorithm changes a > small varioation into a factor-of-two" vs that crazy factor-of-20. > > Can you try Mel's change to make it use > > if (!(vma->vm_flags & VM_WRITE)) > > instead of the pte details? Again, on otherwise plain 3.19, just so > that we have a baseline. I'd be *so* much happer with checking the vma > details over per-pte details, especially ones that change over the > lifetime of the pte entry, and the NUMA code explicitly mucks with. Yup, will do. might take an hour or two before I get to it, though... Cheers, Dave. -- Dave Chinner david@fromorbit.com