From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754043AbbCSXFt (ORCPT ); Thu, 19 Mar 2015 19:05:49 -0400 Received: from mail-ig0-f178.google.com ([209.85.213.178]:35420 "EHLO mail-ig0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752980AbbCSXFr (ORCPT ); Thu, 19 Mar 2015 19:05:47 -0400 MIME-Version: 1.0 In-Reply-To: <20150319224143.GI10105@dastard> References: <20150312184925.GH3406@suse.de> <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> Date: Thu, 19 Mar 2015 16:05:46 -0700 X-Google-Sender-Auth: SkjKtiXyuXov1HVgpccbO8xVO8U Message-ID: Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur From: Linus Torvalds To: Dave Chinner Cc: Mel Gorman , Ingo Molnar , Andrew Morton , Aneesh Kumar , Linux Kernel Mailing List , Linux-MM , xfs@oss.sgi.com, ppc-dev Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner wrote: > > My recollection wasn't faulty - I pulled it from an earlier email. > That said, the original measurement might have been faulty. I ran > the numbers again on the 3.19 kernel I saved away from the original > testing. That came up at 235k, which is pretty much the same as > yesterday's test. The runtime,however, is unchanged from my original > measurements of 4m54s (pte_hack came in at 5m20s). Ok. Good. So the "more than an order of magnitude difference" was really about measurement differences, not quite as real. Looks like more a "factor of two" than a factor of 20. Did you do the profiles the same way? Because that would explain the differences in the TLB flush percentages too (the "1.4% from tlb_invalidate_range()" vs "pretty much everything from migration"). The runtime variation does show that there's some *big* subtle difference for the numa balancing in the exact TNF_NO_GROUP details. It must be *very* unstable for it to make that big of a difference. But I feel at least a *bit* better about "unstable algorithm changes a small varioation into a factor-of-two" vs that crazy factor-of-20. Can you try Mel's change to make it use if (!(vma->vm_flags & VM_WRITE)) instead of the pte details? Again, on otherwise plain 3.19, just so that we have a baseline. I'd be *so* much happer with checking the vma details over per-pte details, especially ones that change over the lifetime of the pte entry, and the NUMA code explicitly mucks with. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id 0184B7F37 for ; Thu, 19 Mar 2015 18:05:55 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay2.corp.sgi.com (Postfix) with ESMTP id BC2A330408D for ; Thu, 19 Mar 2015 16:05:51 -0700 (PDT) Received: from mail-ig0-f173.google.com (mail-ig0-f173.google.com [209.85.213.173]) by cuda.sgi.com with ESMTP id qFDF5cj7RJ687qFh (version=TLSv1 cipher=RC4-SHA bits=128 verify=NO) for ; Thu, 19 Mar 2015 16:05:47 -0700 (PDT) Received: by igcqo1 with SMTP id qo1so3781991igc.0 for ; Thu, 19 Mar 2015 16:05:46 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20150319224143.GI10105@dastard> References: <20150312184925.GH3406@suse.de> <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> Date: Thu, 19 Mar 2015 16:05:46 -0700 Message-ID: Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur From: Linus Torvalds List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: Linux Kernel Mailing List , xfs@oss.sgi.com, Linux-MM , Aneesh Kumar , Andrew Morton , ppc-dev , Ingo Molnar , Mel Gorman On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner wrote: > > My recollection wasn't faulty - I pulled it from an earlier email. > That said, the original measurement might have been faulty. I ran > the numbers again on the 3.19 kernel I saved away from the original > testing. That came up at 235k, which is pretty much the same as > yesterday's test. The runtime,however, is unchanged from my original > measurements of 4m54s (pte_hack came in at 5m20s). Ok. Good. So the "more than an order of magnitude difference" was really about measurement differences, not quite as real. Looks like more a "factor of two" than a factor of 20. Did you do the profiles the same way? Because that would explain the differences in the TLB flush percentages too (the "1.4% from tlb_invalidate_range()" vs "pretty much everything from migration"). The runtime variation does show that there's some *big* subtle difference for the numa balancing in the exact TNF_NO_GROUP details. It must be *very* unstable for it to make that big of a difference. But I feel at least a *bit* better about "unstable algorithm changes a small varioation into a factor-of-two" vs that crazy factor-of-20. Can you try Mel's change to make it use if (!(vma->vm_flags & VM_WRITE)) instead of the pte details? Again, on otherwise plain 3.19, just so that we have a baseline. I'd be *so* much happer with checking the vma details over per-pte details, especially ones that change over the lifetime of the pte entry, and the NUMA code explicitly mucks with. Linus _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f181.google.com (mail-ig0-f181.google.com [209.85.213.181]) by kanga.kvack.org (Postfix) with ESMTP id 3C61A6B0038 for ; Thu, 19 Mar 2015 19:05:47 -0400 (EDT) Received: by igcqo1 with SMTP id qo1so3782112igc.0 for ; Thu, 19 Mar 2015 16:05:47 -0700 (PDT) Received: from mail-ie0-x22f.google.com (mail-ie0-x22f.google.com. [2607:f8b0:4001:c03::22f]) by mx.google.com with ESMTPS id e8si3140042icg.43.2015.03.19.16.05.46 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Mar 2015 16:05:46 -0700 (PDT) Received: by iecvj10 with SMTP id vj10so79929694iec.0 for ; Thu, 19 Mar 2015 16:05:46 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20150319224143.GI10105@dastard> References: <20150312184925.GH3406@suse.de> <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> Date: Thu, 19 Mar 2015 16:05:46 -0700 Message-ID: Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur From: Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Chinner Cc: Mel Gorman , Ingo Molnar , Andrew Morton , Aneesh Kumar , Linux Kernel Mailing List , Linux-MM , xfs@oss.sgi.com, ppc-dev On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner wrote: > > My recollection wasn't faulty - I pulled it from an earlier email. > That said, the original measurement might have been faulty. I ran > the numbers again on the 3.19 kernel I saved away from the original > testing. That came up at 235k, which is pretty much the same as > yesterday's test. The runtime,however, is unchanged from my original > measurements of 4m54s (pte_hack came in at 5m20s). Ok. Good. So the "more than an order of magnitude difference" was really about measurement differences, not quite as real. Looks like more a "factor of two" than a factor of 20. Did you do the profiles the same way? Because that would explain the differences in the TLB flush percentages too (the "1.4% from tlb_invalidate_range()" vs "pretty much everything from migration"). The runtime variation does show that there's some *big* subtle difference for the numa balancing in the exact TNF_NO_GROUP details. It must be *very* unstable for it to make that big of a difference. But I feel at least a *bit* better about "unstable algorithm changes a small varioation into a factor-of-two" vs that crazy factor-of-20. Can you try Mel's change to make it use if (!(vma->vm_flags & VM_WRITE)) instead of the pte details? Again, on otherwise plain 3.19, just so that we have a baseline. I'd be *so* much happer with checking the vma details over per-pte details, especially ones that change over the lifetime of the pte entry, and the NUMA code explicitly mucks with. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-x230.google.com (mail-ig0-x230.google.com [IPv6:2607:f8b0:4001:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id ACB8C1A00B0 for ; Fri, 20 Mar 2015 10:05:49 +1100 (AEDT) Received: by igcqo1 with SMTP id qo1so3781990igc.0 for ; Thu, 19 Mar 2015 16:05:46 -0700 (PDT) MIME-Version: 1.0 Sender: linus971@gmail.com In-Reply-To: <20150319224143.GI10105@dastard> References: <20150312184925.GH3406@suse.de> <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> Date: Thu, 19 Mar 2015 16:05:46 -0700 Message-ID: Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur From: Linus Torvalds To: Dave Chinner Content-Type: text/plain; charset=UTF-8 Cc: Linux Kernel Mailing List , xfs@oss.sgi.com, Linux-MM , Aneesh Kumar , Andrew Morton , ppc-dev , Ingo Molnar , Mel Gorman List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner wrote: > > My recollection wasn't faulty - I pulled it from an earlier email. > That said, the original measurement might have been faulty. I ran > the numbers again on the 3.19 kernel I saved away from the original > testing. That came up at 235k, which is pretty much the same as > yesterday's test. The runtime,however, is unchanged from my original > measurements of 4m54s (pte_hack came in at 5m20s). Ok. Good. So the "more than an order of magnitude difference" was really about measurement differences, not quite as real. Looks like more a "factor of two" than a factor of 20. Did you do the profiles the same way? Because that would explain the differences in the TLB flush percentages too (the "1.4% from tlb_invalidate_range()" vs "pretty much everything from migration"). The runtime variation does show that there's some *big* subtle difference for the numa balancing in the exact TNF_NO_GROUP details. It must be *very* unstable for it to make that big of a difference. But I feel at least a *bit* better about "unstable algorithm changes a small varioation into a factor-of-two" vs that crazy factor-of-20. Can you try Mel's change to make it use if (!(vma->vm_flags & VM_WRITE)) instead of the pte details? Again, on otherwise plain 3.19, just so that we have a baseline. I'd be *so* much happer with checking the vma details over per-pte details, especially ones that change over the lifetime of the pte entry, and the NUMA code explicitly mucks with. Linus