From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751144AbbCTB3t (ORCPT ); Thu, 19 Mar 2015 21:29:49 -0400 Received: from mail-ig0-f177.google.com ([209.85.213.177]:36964 "EHLO mail-ig0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750862AbbCTB3s (ORCPT ); Thu, 19 Mar 2015 21:29:48 -0400 MIME-Version: 1.0 In-Reply-To: <20150320002311.GG28621@dastard> References: <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> <20150320002311.GG28621@dastard> Date: Thu, 19 Mar 2015 18:29:47 -0700 X-Google-Sender-Auth: KpkdS71_qTST3o00A0MnLJNj4yY Message-ID: Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur From: Linus Torvalds To: Dave Chinner Cc: Mel Gorman , Ingo Molnar , Andrew Morton , Aneesh Kumar , Linux Kernel Mailing List , Linux-MM , xfs@oss.sgi.com, ppc-dev Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 19, 2015 at 5:23 PM, Dave Chinner wrote: > > Bit more variance there than the pte checking, but runtime > difference is in the noise - 5m4s vs 4m54s - and profiles are > identical to the pte checking version. Ahh, so that "!(vma->vm_flags & VM_WRITE)" test works _almost_ as well as the original !pte_write() test. Now, can you check that on top of rc4? If I've gotten everything right, we now have: - plain 3.19 (pte_write): 4m54s - 3.19 with vm_flags & VM_WRITE: 5m4s - 3.19 with pte_dirty: 5m20s so the pte_dirty version seems to have been a bad choice indeed. For 4.0-rc4, (which uses pte_dirty) you had 7m50s, so it's still _much_ worse, but I'm wondering whether that VM_WRITE test will at least shrink the difference like it does for 3.19. And the VM_WRITE test should be stable and not have any subtle interaction with the other changes that the numa pte things introduced. It would be good to see if the profiles then pop something *else* up as the performance difference (which I'm sure will remain, since the 7m50s was so far off). Linus