All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: Ying Huang <ying.huang@intel.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org
Cc: Greg Thelen <gthelen@google.com>, Yang Shi <shy828301@gmail.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Tim C Chen <tim.c.chen@intel.com>,
	Brice Goglin <brice.goglin@gmail.com>,
	Michal Hocko <mhocko@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Hesham Almatary <hesham.almatary@huawei.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Alistair Popple <apopple@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Feng Tang <feng.tang@intel.com>,
	Jagdish Gediya <jvgediya@linux.ibm.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [RFC PATCH v4 7/7] mm/demotion: Demote pages according to allocation fallback order
Date: Mon, 6 Jun 2022 09:37:26 +0530	[thread overview]
Message-ID: <a7d3829e-8bc5-d7a8-5e9e-a7943bb50740@linux.ibm.com> (raw)
In-Reply-To: <9f6e60cc8be3cbde4871458c612c5c31d2a9e056.camel@intel.com>

On 6/6/22 6:13 AM, Ying Huang wrote:
> On Fri, 2022-06-03 at 20:39 +0530, Aneesh Kumar K V wrote:
>> On 6/2/22 1:05 PM, Ying Huang wrote:
>>> On Fri, 2022-05-27 at 17:55 +0530, Aneesh Kumar K.V wrote:
>>>> From: Jagdish Gediya <jvgediya@linux.ibm.com>
>>>>
>>>> currently, a higher tier node can only be demoted to selected
>>>> nodes on the next lower tier as defined by the demotion path,
>>>> not any other node from any lower tier.  This strict, hard-coded
>>>> demotion order does not work in all use cases (e.g. some use cases
>>>> may want to allow cross-socket demotion to another node in the same
>>>> demotion tier as a fallback when the preferred demotion node is out
>>>> of space). This demotion order is also inconsistent with the page
>>>> allocation fallback order when all the nodes in a higher tier are
>>>> out of space: The page allocation can fall back to any node from any
>>>> lower tier, whereas the demotion order doesn't allow that currently.
>>>>
>>>> This patch adds support to get all the allowed demotion targets mask
>>>> for node, also demote_page_list() function is modified to utilize this
>>>> allowed node mask by filling it in migration_target_control structure
>>>> before passing it to migrate_pages().
>>>
>>
>> ...
>>
>>>>     * Take pages on @demote_list and attempt to demote them to
>>>>     * another node.  Pages which are not demoted are left on
>>>> @@ -1481,6 +1464,19 @@ static unsigned int demote_page_list(struct list_head *demote_pages,
>>>>    {
>>>>    	int target_nid = next_demotion_node(pgdat->node_id);
>>>>    	unsigned int nr_succeeded;
>>>> +	nodemask_t allowed_mask;
>>>> +
>>>> +	struct migration_target_control mtc = {
>>>> +		/*
>>>> +		 * Allocate from 'node', or fail quickly and quietly.
>>>> +		 * When this happens, 'page' will likely just be discarded
>>>> +		 * instead of migrated.
>>>> +		 */
>>>> +		.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
>>>> +			__GFP_NOMEMALLOC | GFP_NOWAIT,
>>>> +		.nid = target_nid,
>>>> +		.nmask = &allowed_mask
>>>> +	};
>>>
>>> IMHO, we should try to allocate from preferred node firstly (which will
>>> kick kswapd of the preferred node if necessary).  If failed, we will
>>> fallback to all allowed node.
>>>
>>> As we discussed as follows,
>>>
>>> https://lore.kernel.org/lkml/69f2d063a15f8c4afb4688af7b7890f32af55391.camel@intel.com/
>>>
>>> That is, something like below,
>>>
>>> static struct page *alloc_demote_page(struct page *page, unsigned long node)
>>> {
>>> 	struct page *page;
>>> 	nodemask_t allowed_mask;
>>> 	struct migration_target_control mtc = {
>>> 		/*
>>> 		 * Allocate from 'node', or fail quickly and quietly.
>>> 		 * When this happens, 'page' will likely just be discarded
>>> 		 * instead of migrated.
>>> 		 */
>>> 		.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
>>> 			    __GFP_THISNODE  | __GFP_NOWARN |
>>> 			    __GFP_NOMEMALLOC | GFP_NOWAIT,
>>> 		.nid = node
>>> 	};
>>>
>>> 	page = alloc_migration_target(page, (unsigned long)&mtc);
>>> 	if (page)
>>> 		return page;
>>>
>>> 	mtc.gfp_mask &= ~__GFP_THISNODE;
>>> 	mtc.nmask = &allowed_mask;
>>>
>>> 	return alloc_migration_target(page, (unsigned long)&mtc);
>>> }
>>
>> I skipped doing this in v5 because I was not sure this is really what we
>> want.
> 
> I think so.  And this is the original behavior.  We should keep the
> original behavior as much as possible, then make changes if necessary.
> 

That is the reason I split the new page allocation as a separate patch. 
Previous discussion on this topic didn't conclude on whether we really 
need to do the above or not
https://lore.kernel.org/lkml/CAAPL-u9endrWf_aOnPENDPdvT-2-YhCAeJ7ONGckGnXErTLOfQ@mail.gmail.com/

Based on the above I looked at avoiding GFP_THISNODE allocation. If you 
have experiment results that suggest otherwise can you share? I could 
summarize that in the commit message for better description of why 
GFP_THISNODE enforcing is needed.

>> I guess we can do this as part of the change that is going to
>> introduce the usage of memory policy for the allocation?
> 
> Like the memory allocation policy, the default policy should be local
> preferred.  We shouldn't force users to use explicit memory policy for
> that.
> 
> And the added code isn't complex.
> 

-aneesh

  reply	other threads:[~2022-06-06  5:03 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-26 21:22 RFC: Memory Tiering Kernel Interfaces (v3) Wei Xu
2022-05-27  2:58 ` Ying Huang
2022-05-27 14:05   ` Hesham Almatary
2022-05-27 16:25     ` Wei Xu
2022-05-27 12:25 ` [RFC PATCH v4 0/7] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-05-27 12:25   ` [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-05-27 13:59     ` Jonathan Cameron
2022-06-02  6:07     ` Ying Huang
2022-06-06  2:49       ` Ying Huang
2022-06-06  3:56         ` Aneesh Kumar K V
2022-06-06  5:33           ` Ying Huang
2022-06-06  6:01             ` Aneesh Kumar K V
2022-06-06  6:27               ` Aneesh Kumar K.V
2022-06-06  7:53                 ` Ying Huang
2022-06-06  8:01                   ` Aneesh Kumar K V
2022-06-06  8:52                     ` Ying Huang
2022-06-06  9:02                       ` Aneesh Kumar K V
2022-06-08  1:24                         ` Ying Huang
2022-06-08  7:16     ` Ying Huang
2022-06-08  8:24       ` Aneesh Kumar K V
2022-06-08  8:27         ` Ying Huang
2022-05-27 12:25   ` [RFC PATCH v4 2/7] mm/demotion: Expose per node memory tier to sysfs Aneesh Kumar K.V
2022-05-27 14:15     ` Jonathan Cameron
2022-06-03  8:40       ` Aneesh Kumar K V
2022-06-06 14:59         ` Jonathan Cameron
2022-06-06 16:01           ` Aneesh Kumar K V
2022-06-06 16:16             ` Jonathan Cameron
2022-06-06 16:39               ` Aneesh Kumar K V
2022-06-06 17:46                 ` Aneesh Kumar K.V
2022-06-07 14:32                   ` Jonathan Cameron
2022-05-28  1:33     ` kernel test robot
2022-06-08  7:18     ` Ying Huang
2022-06-08  8:25       ` Aneesh Kumar K V
2022-06-08  8:29         ` Ying Huang
2022-05-27 12:25   ` [RFC PATCH v4 3/7] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-05-27 14:31     ` Jonathan Cameron
2022-05-30  3:35     ` [mm/demotion] 8ebccd60c2: BUG:sleeping_function_called_from_invalid_context_at_mm/compaction.c kernel test robot
2022-05-30  3:35       ` kernel test robot
2022-05-27 12:25   ` [RFC PATCH v4 4/7] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM Aneesh Kumar K.V
2022-06-01  6:29     ` Bharata B Rao
2022-06-01 13:49       ` Aneesh Kumar K V
2022-06-02  6:36         ` Bharata B Rao
2022-06-03  9:04           ` Aneesh Kumar K V
2022-06-06 10:11             ` Bharata B Rao
2022-06-06 10:16               ` Aneesh Kumar K V
2022-06-06 11:54                 ` Aneesh Kumar K.V
2022-06-06 12:09                   ` Bharata B Rao
2022-06-06 13:00                     ` Aneesh Kumar K V
2022-05-27 12:25   ` [RFC PATCH v4 5/7] mm/demotion: Add support to associate rank with memory tier Aneesh Kumar K.V
2022-05-27 14:45     ` Jonathan Cameron
2022-05-27 15:45       ` Aneesh Kumar K V
2022-05-30 12:36         ` Jonathan Cameron
2022-06-02  6:41     ` Ying Huang
2022-05-27 12:25   ` [RFC PATCH v4 6/7] mm/demotion: Add support for removing node from demotion memory tiers Aneesh Kumar K.V
2022-06-02  6:43     ` Ying Huang
2022-05-27 12:25   ` [RFC PATCH v4 7/7] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-05-27 15:03     ` Jonathan Cameron
2022-06-02  7:35     ` Ying Huang
2022-06-03 15:09       ` Aneesh Kumar K V
2022-06-06  0:43         ` Ying Huang
2022-06-06  4:07           ` Aneesh Kumar K V [this message]
2022-06-06  5:26             ` Ying Huang
2022-06-06  6:21               ` Aneesh Kumar K.V
2022-06-06  7:42                 ` Ying Huang
2022-06-06  8:02                   ` Aneesh Kumar K V
2022-06-06  8:06                     ` Ying Huang
2022-06-06 17:07               ` Yang Shi
2022-05-27 13:40 ` RFC: Memory Tiering Kernel Interfaces (v3) Aneesh Kumar K V
2022-05-27 16:30   ` Wei Xu
2022-05-29  4:31     ` Ying Huang
2022-05-30 12:50       ` Jonathan Cameron
2022-05-31  1:57         ` Ying Huang
2022-06-07 19:25         ` Tim Chen
2022-06-08  4:41           ` Aneesh Kumar K V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7d3829e-8bc5-d7a8-5e9e-a7943bb50740@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brice.goglin@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=feng.tang@intel.com \
    --cc=gthelen@google.com \
    --cc=hesham.almatary@huawei.com \
    --cc=jvgediya@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=tim.c.chen@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.