From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751865Ab3LLXRj (ORCPT ); Thu, 12 Dec 2013 18:17:39 -0500 Received: from relay2.sgi.com ([192.48.179.30]:45645 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751579Ab3LLXRf (ORCPT ); Thu, 12 Dec 2013 18:17:35 -0500 Date: Thu, 12 Dec 2013 17:17:30 -0600 From: Alex Thorlton To: Rik van Riel Cc: linux-mm@kvack.org, Andrew Morton , "Kirill A. Shutemov" , Benjamin Herrenschmidt , Wanpeng Li , Mel Gorman , Michel Lespinasse , Benjamin LaHaise , Oleg Nesterov , "Eric W. Biederman" , Andy Lutomirski , Al Viro , David Rientjes , Zhang Yanfei , Peter Zijlstra , Johannes Weiner , Michal Hocko , Jiang Liu , Cody P Schafer , Glauber Costa , Kamezawa Hiroyuki , Naoya Horiguchi , linux-kernel@vger.kernel.org, Andrea Arcangeli Subject: Re: [RFC PATCH 2/3] Add tunable to control THP behavior Message-ID: <20131212231730.GD6034@sgi.com> References: <20131212180050.GC134240@sgi.com> <52AA2C87.5040509@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52AA2C87.5040509@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 12, 2013 at 04:37:11PM -0500, Rik van Riel wrote: > On 12/12/2013 01:00 PM, Alex Thorlton wrote: > >This part of the patch adds a tunable to > >/sys/kernel/mm/transparent_hugepage called threshold. This threshold > >determines how many pages a user must fault in from a single node before > >a temporary compound page is turned into a THP. > > >+++ b/mm/huge_memory.c > >@@ -44,6 +44,9 @@ unsigned long transparent_hugepage_flags __read_mostly = > > (1< > (1< > > >+/* default to 1 page threshold for handing out thps; maintains old behavior */ > >+static int transparent_hugepage_threshold = 1; > > I assume the motivation for writing all this code is that "1" > was not a good value in your tests. Yes, that's correct. > That makes me wonder, why should 1 be the default value with > your patches? The main reason I set the default to 1 was because the majority of jobs aren't hurt by the existing THP behavior. I figured it would be best to default to having things behave the same as they do now, but provide the option to increase the threshold on systems that run jobs that could be adversely affected by the current behavior. > If there is a better value, why should we not use that? > > What is the upside of using a better value? > > What is the downside? The problem here is that what the "better" value is can vary greatly depending on how a particular task allocates memory. Setting the threshold too high can negatively affect the performance of jobs that behave well with the current behavior, setting it too low won't yield a performance increase for the jobs that are hurt by the current behavior. With some more thorough testing, I'm sure that we could arrive at a value that will help out jobs which behave poorly under current conditions, while having a minimal effect on jobs that already perform well. At this point, I'm looking more to ensure that everybody likes this approach to solving the problem before putting the finishing touches on the patches, and doing testing to find a good middle ground. > Is there a value that would to bound the downside, so it > is almost always smaller than the upside? Again, the problem here is that, to find a good value, we have to know quite a bit about why a particular value is bad for a particular job. While, as stated above, I think we can probably find a good middle ground to use as a default, in the end it will be the job of individual sysadmins to determine what value works best for their particular applications, and tune things accordingly. - Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f172.google.com (mail-ie0-f172.google.com [209.85.223.172]) by kanga.kvack.org (Postfix) with ESMTP id 7BE606B0035 for ; Thu, 12 Dec 2013 18:17:36 -0500 (EST) Received: by mail-ie0-f172.google.com with SMTP id qd12so1796085ieb.31 for ; Thu, 12 Dec 2013 15:17:36 -0800 (PST) Date: Thu, 12 Dec 2013 17:17:30 -0600 From: Alex Thorlton Subject: Re: [RFC PATCH 2/3] Add tunable to control THP behavior Message-ID: <20131212231730.GD6034@sgi.com> References: <20131212180050.GC134240@sgi.com> <52AA2C87.5040509@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52AA2C87.5040509@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: linux-mm@kvack.org, Andrew Morton , "Kirill A. Shutemov" , Benjamin Herrenschmidt , Wanpeng Li , Mel Gorman , Michel Lespinasse , Benjamin LaHaise , Oleg Nesterov , "Eric W. Biederman" , Andy Lutomirski , Al Viro , David Rientjes , Zhang Yanfei , Peter Zijlstra , Johannes Weiner , Michal Hocko , Jiang Liu , Cody P Schafer , Glauber Costa , Kamezawa Hiroyuki , Naoya Horiguchi , linux-kernel@vger.kernel.org, Andrea Arcangeli On Thu, Dec 12, 2013 at 04:37:11PM -0500, Rik van Riel wrote: > On 12/12/2013 01:00 PM, Alex Thorlton wrote: > >This part of the patch adds a tunable to > >/sys/kernel/mm/transparent_hugepage called threshold. This threshold > >determines how many pages a user must fault in from a single node before > >a temporary compound page is turned into a THP. > > >+++ b/mm/huge_memory.c > >@@ -44,6 +44,9 @@ unsigned long transparent_hugepage_flags __read_mostly = > > (1< > (1< > > >+/* default to 1 page threshold for handing out thps; maintains old behavior */ > >+static int transparent_hugepage_threshold = 1; > > I assume the motivation for writing all this code is that "1" > was not a good value in your tests. Yes, that's correct. > That makes me wonder, why should 1 be the default value with > your patches? The main reason I set the default to 1 was because the majority of jobs aren't hurt by the existing THP behavior. I figured it would be best to default to having things behave the same as they do now, but provide the option to increase the threshold on systems that run jobs that could be adversely affected by the current behavior. > If there is a better value, why should we not use that? > > What is the upside of using a better value? > > What is the downside? The problem here is that what the "better" value is can vary greatly depending on how a particular task allocates memory. Setting the threshold too high can negatively affect the performance of jobs that behave well with the current behavior, setting it too low won't yield a performance increase for the jobs that are hurt by the current behavior. With some more thorough testing, I'm sure that we could arrive at a value that will help out jobs which behave poorly under current conditions, while having a minimal effect on jobs that already perform well. At this point, I'm looking more to ensure that everybody likes this approach to solving the problem before putting the finishing touches on the patches, and doing testing to find a good middle ground. > Is there a value that would to bound the downside, so it > is almost always smaller than the upside? Again, the problem here is that, to find a good value, we have to know quite a bit about why a particular value is bad for a particular job. While, as stated above, I think we can probably find a good middle ground to use as a default, in the end it will be the job of individual sysadmins to determine what value works best for their particular applications, and tune things accordingly. - Alex -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org