From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3F7BC433E0 for ; Thu, 4 Mar 2021 12:58:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 746A364F02 for ; Thu, 4 Mar 2021 12:58:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 746A364F02 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DB0AF6B0007; Thu, 4 Mar 2021 07:58:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D60826B0008; Thu, 4 Mar 2021 07:58:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDBCD6B000A; Thu, 4 Mar 2021 07:58:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by kanga.kvack.org (Postfix) with ESMTP id A2C406B0007 for ; Thu, 4 Mar 2021 07:58:03 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 63C771801E393 for ; Thu, 4 Mar 2021 12:58:03 +0000 (UTC) X-FDA: 77882194446.24.22F531E Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf14.hostedemail.com (Postfix) with ESMTP id ACC2AC0007C4 for ; Thu, 4 Mar 2021 12:58:00 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1614862681; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=YPoEl0/ZdIZ5t+vbQ1ZaJR9fgonDCSo8hg3leKsMxqY=; b=pj0foXFEm1W7tfnc6IOXp8mMr1OSC0buLLlE8AjgzXJjd/dz+1fh3V9tbZmfojiSyfzbCD scCwXndV3eRqXmmAl9HGBU4zFGOB7b2Zx+uKeY9r41Yj+v0X1sU62D1uPeuXmkeHEM+b3t TfxbqoXL1w4vqUaHJGvv9MQHgpiHxjo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 21BC8AD57; Thu, 4 Mar 2021 12:58:01 +0000 (UTC) Date: Thu, 4 Mar 2021 13:57:55 +0100 From: Michal Hocko To: Ben Widawsky Cc: Feng Tang , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Andrew Morton , Andrea Arcangeli , David Rientjes , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , "Hansen, Dave" , Andi leen , "Williams, Dan J" Subject: Re: [PATCH v3 RFC 14/14] mm: speedup page alloc for MPOL_PREFERRED_MANY by adding a NO_SLOWPATH gfp bit Message-ID: References: <20210303120717.GA16736@shbuild999.sh.intel.com> <20210303121833.GB16736@shbuild999.sh.intel.com> <20210303131832.GB78458@shbuild999.sh.intel.com> <20210303134644.GC78458@shbuild999.sh.intel.com> <20210303163141.v5wu2sfo2zj2qqsw@intel.com> <20210303172250.wbp47skyuf6r37wi@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210303172250.wbp47skyuf6r37wi@intel.com> X-Stat-Signature: b83e1418gctawjare4hh1qybmtx469p4 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: ACC2AC0007C4 Received-SPF: none (suse.com>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614862680-791658 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 03-03-21 09:22:50, Ben Widawsky wrote: > On 21-03-03 18:14:30, Michal Hocko wrote: > > On Wed 03-03-21 08:31:41, Ben Widawsky wrote: > > > On 21-03-03 14:59:35, Michal Hocko wrote: > > > > On Wed 03-03-21 21:46:44, Feng Tang wrote: > > > > > On Wed, Mar 03, 2021 at 09:18:32PM +0800, Tang, Feng wrote: > > > > > > On Wed, Mar 03, 2021 at 01:32:11PM +0100, Michal Hocko wrote: > > > > > > > On Wed 03-03-21 20:18:33, Feng Tang wrote: > > > > [...] > > > > > > > > One thing I tried which can fix the slowness is: > > > > > > > > > > > > > > > > + gfp_mask &= ~(__GFP_DIRECT_RECLAIM | __GFP_KSWAPD_RECLAIM); > > > > > > > > > > > > > > > > which explicitly clears the 2 kinds of reclaim. And I thought it's too > > > > > > > > hacky and didn't mention it in the commit log. > > > > > > > > > > > > > > Clearing __GFP_DIRECT_RECLAIM would be the right way to achieve > > > > > > > GFP_NOWAIT semantic. Why would you want to exclude kswapd as well? > > > > > > > > > > > > When I tried gfp_mask &= ~__GFP_DIRECT_RECLAIM, the slowness couldn't > > > > > > be fixed. > > > > > > > > > > I just double checked by rerun the test, 'gfp_mask &= ~__GFP_DIRECT_RECLAIM' > > > > > can also accelerate the allocation much! though is still a little slower than > > > > > this patch. Seems I've messed some of the tries, and sorry for the confusion! > > > > > > > > > > Could this be used as the solution? or the adding another fallback_nodemask way? > > > > > but the latter will change the current API quite a bit. > > > > > > > > I haven't got to the whole series yet. The real question is whether the > > > > first attempt to enforce the preferred mask is a general win. I would > > > > argue that it resembles the existing single node preferred memory policy > > > > because that one doesn't push heavily on the preferred node either. So > > > > dropping just the direct reclaim mode makes some sense to me. > > > > > > > > IIRC this is something I was recommending in an early proposal of the > > > > feature. > > > > > > My assumption [FWIW] is that the usecases we've outlined for multi-preferred > > > would want more heavy pushing on the preference mask. However, maybe the uapi > > > could dictate how hard to try/not try. > > > > What does that mean and what is the expectation from the kernel to be > > more or less cast in stone? > > > > (I'm not positive I've understood your question, so correct me if I > misunderstood) > > I'm not sure there is a stone-cast way to define it nor should we. OK, I thought you want the behavior to diverge from the existing MPOL_PREFERRED which only prefers the configured node as a default but the allocator is free to fallback to any other node under memory pressure. For the multiple preferred nodes the same should be applied and only attempt lightweight attempt before falling back to full nodeset. Your paragraph I was replying to is not in line with this though. > At the very > least though, something in uapi that has a general mapping to GFP flags > (specifically around reclaim) for the first round of allocation could make > sense. I do not think this is a good idea. > In my head there are 3 levels of request possible for multiple nodes: > 1. BIND: Those nodes or die. > 2. Preferred hard: Those nodes and I'm willing to wait. Fallback if impossible. > 3. Preferred soft: Those nodes but I don't want to wait. I do agree that an intermediate "preference" can be helpful because binding is just too strict and OOM semantic is far from ideal. But this would need a new policy. > Current UAPI in the series doesn't define a distinction between 2, and 3. As I > understand the change, Feng is defining the behavior to be #3, which makes #2 > not an option. I sort of punted on defining it entirely, in the beginning. I really think it should be in line with the existing preferred policy behavior. -- Michal Hocko SUSE Labs