From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5A70C388F2 for ; Fri, 6 Nov 2020 07:29:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4CAB720867 for ; Fri, 6 Nov 2020 07:29:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726477AbgKFH3F (ORCPT ); Fri, 6 Nov 2020 02:29:05 -0500 Received: from mga18.intel.com ([134.134.136.126]:30669 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbgKFH3E (ORCPT ); Fri, 6 Nov 2020 02:29:04 -0500 IronPort-SDR: QUl+KxmImdBiN9AZAriuCLrFgp60TwRXhDaXI1PuHYzIf9skOEoVEvA4TgoLmg3hLsZs/8Df9l BiBUbgL0BAQA== X-IronPort-AV: E=McAfee;i="6000,8403,9796"; a="157292853" X-IronPort-AV: E=Sophos;i="5.77,455,1596524400"; d="scan'208";a="157292853" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Nov 2020 23:29:03 -0800 IronPort-SDR: 7V1IJ6pjk8oQLtLEcGW9OjlhbuwWfMdcaCDbIGN/97HD6vuY+I3SKCm+h2OvjHDbOkruujbqKH s65cIa9Mlwkg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,455,1596524400"; d="scan'208";a="354616998" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.65]) by fmsmga004.fm.intel.com with ESMTP; 05 Nov 2020 23:29:00 -0800 From: "Huang\, Ying" To: Mel Gorman Cc: Peter Zijlstra , , , Andrew Morton , Ingo Molnar , Rik van Riel , "Johannes Weiner" , "Matthew Wilcox \(Oracle\)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes , ben.widawsky@intel.com Subject: Re: [PATCH -V2 2/2] autonuma: Migrate on fault among multiple bound nodes References: <20201028023411.15045-1-ying.huang@intel.com> <20201028023411.15045-3-ying.huang@intel.com> <20201102111717.GB3306@suse.de> <87eel9wumd.fsf@yhuang-dev.intel.com> <20201105112523.GQ3306@suse.de> Date: Fri, 06 Nov 2020 15:28:59 +0800 In-Reply-To: <20201105112523.GQ3306@suse.de> (Mel Gorman's message of "Thu, 5 Nov 2020 11:25:23 +0000") Message-ID: <87v9ejosec.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mel Gorman writes: > On Wed, Nov 04, 2020 at 01:36:58PM +0800, Huang, Ying wrote: >> But from another point of view, I suggest to remove the constraints of >> MPOL_F_MOF in the future. If the overhead of AutoNUMA isn't acceptable, >> why not just disable AutoNUMA globally via sysctl knob? >> > > Because it's a double edged sword. NUMA Balancing can make a workload > faster while still incurring more overhead than it should -- particularly > when threads are involved rescanning the same or unrelated regions. > Global disabling only really should happen when an application is running > that is the only application on the machine and has full NUMA awareness. Got it. So NUMA Balancing may in generally benefit some workloads but hurt some other workloads on one machine. So we need a method to enable/disable NUMA Balancing for one workload. Previously, this is done via the explicit NUMA policy. If some explicit NUMA policy is specified, NUMA Balancing is disabled for the memory region or the thread. And this can be reverted again for a memory region via MPOL_MF_LAZY. It appears that we lacks MPOL_MF_LAZY for the thread yet. >> > It might still end up being better but I was not aware of a >> > *realistic* workload that binds to multiple nodes >> > deliberately. Generally I expect if an application is binding, it's >> > binding to one local node. >> >> Yes. It's not popular configuration for now. But for the memory >> tiering system with both DRAM and PMEM, the DRAM and PMEM in one socket >> will become 2 NUMA nodes. To avoid too much cross-socket memory >> accessing, but take advantage of both the DRAM and PMEM, the workload >> can be bound to 2 NUMA nodes (DRAM and PMEM). >> > > Ok, that may lead to unpredictable performance as it'll have variable > performance with limited control of the "important" applications that > should use DRAM over PMEM. That's a long road but the step is not > incompatible with the long-term goal. Yes. Ben Widawsky is working on a patchset to make it possible to prefer the remote DRAM instead of the local PMEM as follows, https://lore.kernel.org/linux-mm/20200630212517.308045-1-ben.widawsky@intel.com/ Best Regards, Huang, Ying From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B87A1C2D0A3 for ; Fri, 6 Nov 2020 07:29:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2E92B204EA for ; Fri, 6 Nov 2020 07:29:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2E92B204EA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3CD6A6B005C; Fri, 6 Nov 2020 02:29:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3570D6B005D; Fri, 6 Nov 2020 02:29:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F6E96B0068; Fri, 6 Nov 2020 02:29:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id DE1DC6B005C for ; Fri, 6 Nov 2020 02:29:06 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7E05C362B for ; Fri, 6 Nov 2020 07:29:06 +0000 (UTC) X-FDA: 77453167092.22.rat46_470ddd7272d0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 5AB5B18038E60 for ; Fri, 6 Nov 2020 07:29:06 +0000 (UTC) X-HE-Tag: rat46_470ddd7272d0 X-Filterd-Recvd-Size: 4444 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Fri, 6 Nov 2020 07:29:04 +0000 (UTC) IronPort-SDR: OGdJ23+njSna6hsMn3uycFKYEMVqyvRSqdCoDTL0XCdyrXhlyDBRKC95X1UCxmALaSUS57RTaM VbbOWCIUnDFA== X-IronPort-AV: E=McAfee;i="6000,8403,9796"; a="231140602" X-IronPort-AV: E=Sophos;i="5.77,455,1596524400"; d="scan'208";a="231140602" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Nov 2020 23:29:03 -0800 IronPort-SDR: 7V1IJ6pjk8oQLtLEcGW9OjlhbuwWfMdcaCDbIGN/97HD6vuY+I3SKCm+h2OvjHDbOkruujbqKH s65cIa9Mlwkg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,455,1596524400"; d="scan'208";a="354616998" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.65]) by fmsmga004.fm.intel.com with ESMTP; 05 Nov 2020 23:29:00 -0800 From: "Huang\, Ying" To: Mel Gorman Cc: Peter Zijlstra , , , Andrew Morton , Ingo Molnar , Rik van Riel , "Johannes Weiner" , "Matthew Wilcox \(Oracle\)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes , ben.widawsky@intel.com Subject: Re: [PATCH -V2 2/2] autonuma: Migrate on fault among multiple bound nodes References: <20201028023411.15045-1-ying.huang@intel.com> <20201028023411.15045-3-ying.huang@intel.com> <20201102111717.GB3306@suse.de> <87eel9wumd.fsf@yhuang-dev.intel.com> <20201105112523.GQ3306@suse.de> Date: Fri, 06 Nov 2020 15:28:59 +0800 In-Reply-To: <20201105112523.GQ3306@suse.de> (Mel Gorman's message of "Thu, 5 Nov 2020 11:25:23 +0000") Message-ID: <87v9ejosec.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Mel Gorman writes: > On Wed, Nov 04, 2020 at 01:36:58PM +0800, Huang, Ying wrote: >> But from another point of view, I suggest to remove the constraints of >> MPOL_F_MOF in the future. If the overhead of AutoNUMA isn't acceptable, >> why not just disable AutoNUMA globally via sysctl knob? >> > > Because it's a double edged sword. NUMA Balancing can make a workload > faster while still incurring more overhead than it should -- particularly > when threads are involved rescanning the same or unrelated regions. > Global disabling only really should happen when an application is running > that is the only application on the machine and has full NUMA awareness. Got it. So NUMA Balancing may in generally benefit some workloads but hurt some other workloads on one machine. So we need a method to enable/disable NUMA Balancing for one workload. Previously, this is done via the explicit NUMA policy. If some explicit NUMA policy is specified, NUMA Balancing is disabled for the memory region or the thread. And this can be reverted again for a memory region via MPOL_MF_LAZY. It appears that we lacks MPOL_MF_LAZY for the thread yet. >> > It might still end up being better but I was not aware of a >> > *realistic* workload that binds to multiple nodes >> > deliberately. Generally I expect if an application is binding, it's >> > binding to one local node. >> >> Yes. It's not popular configuration for now. But for the memory >> tiering system with both DRAM and PMEM, the DRAM and PMEM in one socket >> will become 2 NUMA nodes. To avoid too much cross-socket memory >> accessing, but take advantage of both the DRAM and PMEM, the workload >> can be bound to 2 NUMA nodes (DRAM and PMEM). >> > > Ok, that may lead to unpredictable performance as it'll have variable > performance with limited control of the "important" applications that > should use DRAM over PMEM. That's a long road but the step is not > incompatible with the long-term goal. Yes. Ben Widawsky is working on a patchset to make it possible to prefer the remote DRAM instead of the local PMEM as follows, https://lore.kernel.org/linux-mm/20200630212517.308045-1-ben.widawsky@intel.com/ Best Regards, Huang, Ying