From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752306AbdCPNvW (ORCPT ); Thu, 16 Mar 2017 09:51:22 -0400 Received: from mga05.intel.com ([192.55.52.43]:1736 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751899AbdCPNvU (ORCPT ); Thu, 16 Mar 2017 09:51:20 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,172,1486454400"; d="scan'208";a="77776423" Date: Thu, 16 Mar 2017 21:51:22 +0800 From: Aaron Lu To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Hansen , Tim Chen , Andrew Morton , Ying Huang Subject: Re: [PATCH v2 0/5] mm: support parallel free of memory Message-ID: <20170316135122.GF13054@aaronlu.sh.intel.com> References: <1489568404-7817-1-git-send-email-aaron.lu@intel.com> <20170315141813.GB32626@dhcp22.suse.cz> <20170315154406.GF2442@aaronlu.sh.intel.com> <20170315162843.GA27197@dhcp22.suse.cz> <20170316073403.GE1661@aaronlu.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170316073403.GE1661@aaronlu.sh.intel.com> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 16, 2017 at 03:34:03PM +0800, Aaron Lu wrote: > On Wed, Mar 15, 2017 at 05:28:43PM +0100, Michal Hocko wrote: > ... ... > > After all the amount of the work to be done is the same we just risk > > more lock contentions, unexpected CPU usage etc. > > I start to realize this is a good question. > > I guess max_active=4 produced almost the best result(max_active=8 is > only slightly better) is due to the test box is a 4 node machine and > therefore, there are 4 zone->lock to contend(let's ignore those tiny > zones only available in node 0). > > I'm going to test on a EP to see if max_active=2 will suffice to produce > a good enough result. If so, the proper default number should be the > number of nodes. Here are test results on 2 nodes EP with 128GiB memory, test size 100GiB. max_active time vanilla 2.971s ±3.8% 2 1.699s ±13.7% 4 1.616s ±3.1% 8 1.642s ±0.9% So 4 gives best result but 2 is probably good enough. If the size each worker deals with is changed from 1G to 2G: max_active time 2 1.605s ±1.7% 4 1.639s ±1.2% 8 1.626s ±1.8% Considering that we are mostly improving for memory intensive apps, the default setting should probably be: max_active = node_number with each worker freeing 2G memory. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f72.google.com (mail-pg0-f72.google.com [74.125.83.72]) by kanga.kvack.org (Postfix) with ESMTP id B96776B038C for ; Thu, 16 Mar 2017 09:51:13 -0400 (EDT) Received: by mail-pg0-f72.google.com with SMTP id q126so92507285pga.0 for ; Thu, 16 Mar 2017 06:51:13 -0700 (PDT) Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id b61si5378268plc.304.2017.03.16.06.51.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Mar 2017 06:51:12 -0700 (PDT) Date: Thu, 16 Mar 2017 21:51:22 +0800 From: Aaron Lu Subject: Re: [PATCH v2 0/5] mm: support parallel free of memory Message-ID: <20170316135122.GF13054@aaronlu.sh.intel.com> References: <1489568404-7817-1-git-send-email-aaron.lu@intel.com> <20170315141813.GB32626@dhcp22.suse.cz> <20170315154406.GF2442@aaronlu.sh.intel.com> <20170315162843.GA27197@dhcp22.suse.cz> <20170316073403.GE1661@aaronlu.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170316073403.GE1661@aaronlu.sh.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Hansen , Tim Chen , Andrew Morton , Ying Huang On Thu, Mar 16, 2017 at 03:34:03PM +0800, Aaron Lu wrote: > On Wed, Mar 15, 2017 at 05:28:43PM +0100, Michal Hocko wrote: > ... ... > > After all the amount of the work to be done is the same we just risk > > more lock contentions, unexpected CPU usage etc. > > I start to realize this is a good question. > > I guess max_active=4 produced almost the best result(max_active=8 is > only slightly better) is due to the test box is a 4 node machine and > therefore, there are 4 zone->lock to contend(let's ignore those tiny > zones only available in node 0). > > I'm going to test on a EP to see if max_active=2 will suffice to produce > a good enough result. If so, the proper default number should be the > number of nodes. Here are test results on 2 nodes EP with 128GiB memory, test size 100GiB. max_active time vanilla 2.971s +-3.8% 2 1.699s +-13.7% 4 1.616s +-3.1% 8 1.642s +-0.9% So 4 gives best result but 2 is probably good enough. If the size each worker deals with is changed from 1G to 2G: max_active time 2 1.605s +-1.7% 4 1.639s +-1.2% 8 1.626s +-1.8% Considering that we are mostly improving for memory intensive apps, the default setting should probably be: max_active = node_number with each worker freeing 2G memory. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org