From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8397C433EF for ; Thu, 14 Oct 2021 14:56:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 49B68610F8 for ; Thu, 14 Oct 2021 14:56:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 49B68610F8 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id CCA12900003; Thu, 14 Oct 2021 10:56:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C522C900002; Thu, 14 Oct 2021 10:56:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF283900003; Thu, 14 Oct 2021 10:56:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id 9BE70900002 for ; Thu, 14 Oct 2021 10:56:25 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5819C1844CAD0 for ; Thu, 14 Oct 2021 14:56:25 +0000 (UTC) X-FDA: 78695343930.35.AF359AD Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf17.hostedemail.com (Postfix) with ESMTP id B77B1F00039D for ; Thu, 14 Oct 2021 14:56:24 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4C4D121A91; Thu, 14 Oct 2021 14:56:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1634223383; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bTAglrfNDAZmziBnhaOG7PuKJOrQOlG3TSn8a9IMHkM=; b=WDjO+ewEFlrEM3esvu6Cs3OHos6O7/oae9dLvY7gsnfjXNILvFiTRnoZLSwW0qX0DJG6Wg T4ukBrIS68EO9T8nxNUiukSm58IJYhD4OjT54PNkQTmXwm4TwyRYHSnTf7MWom6jrI8HT6 RPX/RQa+D+n4HIyb0HwdP+44uGmquaE= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id AE299A3B85; Thu, 14 Oct 2021 14:56:22 +0000 (UTC) Date: Thu, 14 Oct 2021 16:56:20 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: Andi Kleen , linux-mm@kvack.org, akpm@linux-foundation.org, Ben Widawsky , Dave Hansen , Feng Tang , Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Dan Williams , Huang Ying Subject: Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy Message-ID: References: <20211013094539.962357-1-aneesh.kumar@linux.ibm.com> <83483424-e617-51c4-d55c-6106e66e2659@linux.intel.com> <87pms8ymvl.fsf@linux.ibm.com> <49514c97-c540-48ee-0b2f-3cd7bd3dfcf9@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B77B1F00039D X-Stat-Signature: 3qyugkcm4xq5wf8gaoerbazeoiez6uso Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=WDjO+ewE; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf17.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com X-HE-Tag: 1634223384-783953 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 14-10-21 18:59:14, Aneesh Kumar K.V wrote: > On 10/14/21 17:11, Michal Hocko wrote: > > On Thu 14-10-21 15:58:29, Aneesh Kumar K.V wrote: > > > On 10/14/21 15:08, Michal Hocko wrote: > > [...] > > > > Besides that it would be really great to finish the discussion about the > > > > usecase before suggesting a new userspace API. > > > > > > > > > > Application would like to hint a preferred node for allocating memory > > > backing a va range and at the same time wants to avoid fallback to some set > > > of nodes (in the use case I am interested don't fall back to slow memory > > > nodes). > > > > We do have means for that, right? You can set your memory policy and > > then set the cpu afffinity to the node you want to allocate from > > initially. You can migrate to a different cpu/node if this is not the > > preferred affinity. Why is that not usable? > > For the same reason you mentioned earlier, these nodes can be cpu less > nodes. It would have been easier if you were explicit about the usecase rather than let other guess. > > Also think about extensibility. Say I want to allocate from a set of > > nodes first before falling back to the rest of the nodemask? If you want > > to add a new API then think of other potential usecases. > > > > Describing the specific allocation details become hard with preferred node > being a nodemask. With the below interface > > SYSCALL_DEFINE5(preferred_mbind, unsigned long, start, unsigned long, len, > const unsigned long __user *, preferred_nmask, const unsigned long __user > *, fallback_nmask, > unsigned long, maxnode) > { > > > 1. The preferred node is the first node in the preferred node mask > 2. Then we try to allocate from nodes present in the preferred node mask > which is closer to the first node in the preferred node mask > 3. If the above fails, we try to allocate from nodes in the fallback node > mask which is closer to the first node in the preferred nodemask. > > Isn't that too complicated? Do we have a real usecase for that? No, I think this is a suboptimal interface. AFAIU you really want to define a "home" node(s) rather than any policy. Home node would effectively override the default local node whatever policy you have as it makes sense whether you have MPOL_PREFERRED_MANY or MPOL_BIND. Another potential interface would be set_nodeorder which would explicitly set the allocation fallback ordering. Again agnostic of the underlying memory policy. This would be more generic but the question is whether this is not too generic and whether there are usecases for that. Makes sense? -- Michal Hocko SUSE Labs