All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	LKML <linux-kernel@vger.kernel.org>,
	linux-nvme@lists.infradead.org
Subject: Re: [LSF/MM ATTEND ] memory reclaim with NUMA rebalancing
Date: Sat, 23 Feb 2019 21:42:26 +0800	[thread overview]
Message-ID: <20190223134226.spesmpw6qnnfyvrr@wfg-t540p.sh.intel.com> (raw)
In-Reply-To: <20190223132748.awedzeybi6bjz3c5@wfg-t540p.sh.intel.com>

On Sat, Feb 23, 2019 at 09:27:48PM +0800, Fengguang Wu wrote:
>On Thu, Jan 31, 2019 at 12:19:47PM +0530, Aneesh Kumar K.V wrote:
>>Michal Hocko <mhocko@kernel.org> writes:
>>
>>> Hi,
>>> I would like to propose the following topic for the MM track. Different
>>> group of people would like to use NVIDMMs as a low cost & slower memory
>>> which is presented to the system as a NUMA node. We do have a NUMA API
>>> but it doesn't really fit to "balance the memory between nodes" needs.
>>> People would like to have hot pages in the regular RAM while cold pages
>>> might be at lower speed NUMA nodes. We do have NUMA balancing for
>>> promotion path but there is notIhing for the other direction. Can we
>>> start considering memory reclaim to move pages to more distant and idle
>>> NUMA nodes rather than reclaim them? There are certainly details that
>>> will get quite complicated but I guess it is time to start discussing
>>> this at least.
>>
>>I would be interested in this topic too. I would like to understand
>
>So do me. I'd be glad to take in the discussions if can attend the slot.
>
>>the API and how it can help exploit the different type of devices we
>>have on OpenCAPI.
>>
>>IMHO there are few proposals related to this which we could discuss together
>>
>>1. HMAT series which want to expose these devices as Numa nodes
>>2. The patch series from Dave Hansen which just uses Pmem as Numa node.
>>3. The patch series from Fengguang Wu which does prevent default
>>allocation from these numa nodes by excluding them from zone list.
>>4. The patch series from Jerome Glisse which doesn't expose these as
>>numa nodes.
>>
>>IMHO (3) is suggesting that we really don't want them as numa nodes. But
>>since Numa is the only interface we currently have to present them as
>>memory and control the allocation and migration we are forcing
>>ourselves to Numa nodes and then excluding them from default allocation.
>
>Regarding (3), we actually made a default policy choice for
>"separating fallback zonelists for PMEM/DRAM nodes" for the
>typical use scenarios.
>
>In long term, it's better to not build such assumption into kernel.
>There may well be workloads that are cost sensitive rather than
>performance sensitive. Suppose people buy a machine with tiny DRAM
>and large PMEM. In which case the suitable policy may be to
>
>1) prefer (but not bind) slab etc. kernel pages in DRAM
>2) allocate LRU etc. pages from either DRAM or PMEM node

The point is not separating fallback zonelists for DRAM and PMEM in
this case.

>In summary, kernel may offer flexibility for different policies for
>use by different users. PMEM has different characteristics comparing
>to DRAM, users may or may not be treated differently than DRAM through
>policies.
>
>Thanks,
>Fengguang

WARNING: multiple messages have this Message-ID (diff)
From: fengguang.wu@intel.com (Fengguang Wu)
Subject: [LSF/MM ATTEND ] memory reclaim with NUMA rebalancing
Date: Sat, 23 Feb 2019 21:42:26 +0800	[thread overview]
Message-ID: <20190223134226.spesmpw6qnnfyvrr@wfg-t540p.sh.intel.com> (raw)
In-Reply-To: <20190223132748.awedzeybi6bjz3c5@wfg-t540p.sh.intel.com>

On Sat, Feb 23, 2019@09:27:48PM +0800, Fengguang Wu wrote:
>On Thu, Jan 31, 2019@12:19:47PM +0530, Aneesh Kumar K.V wrote:
>>Michal Hocko <mhocko at kernel.org> writes:
>>
>>> Hi,
>>> I would like to propose the following topic for the MM track. Different
>>> group of people would like to use NVIDMMs as a low cost & slower memory
>>> which is presented to the system as a NUMA node. We do have a NUMA API
>>> but it doesn't really fit to "balance the memory between nodes" needs.
>>> People would like to have hot pages in the regular RAM while cold pages
>>> might be at lower speed NUMA nodes. We do have NUMA balancing for
>>> promotion path but there is notIhing for the other direction. Can we
>>> start considering memory reclaim to move pages to more distant and idle
>>> NUMA nodes rather than reclaim them? There are certainly details that
>>> will get quite complicated but I guess it is time to start discussing
>>> this at least.
>>
>>I would be interested in this topic too. I would like to understand
>
>So do me. I'd be glad to take in the discussions if can attend the slot.
>
>>the API and how it can help exploit the different type of devices we
>>have on OpenCAPI.
>>
>>IMHO there are few proposals related to this which we could discuss together
>>
>>1. HMAT series which want to expose these devices as Numa nodes
>>2. The patch series from Dave Hansen which just uses Pmem as Numa node.
>>3. The patch series from Fengguang Wu which does prevent default
>>allocation from these numa nodes by excluding them from zone list.
>>4. The patch series from Jerome Glisse which doesn't expose these as
>>numa nodes.
>>
>>IMHO (3) is suggesting that we really don't want them as numa nodes. But
>>since Numa is the only interface we currently have to present them as
>>memory and control the allocation and migration we are forcing
>>ourselves to Numa nodes and then excluding them from default allocation.
>
>Regarding (3), we actually made a default policy choice for
>"separating fallback zonelists for PMEM/DRAM nodes" for the
>typical use scenarios.
>
>In long term, it's better to not build such assumption into kernel.
>There may well be workloads that are cost sensitive rather than
>performance sensitive. Suppose people buy a machine with tiny DRAM
>and large PMEM. In which case the suitable policy may be to
>
>1) prefer (but not bind) slab etc. kernel pages in DRAM
>2) allocate LRU etc. pages from either DRAM or PMEM node

The point is not separating fallback zonelists for DRAM and PMEM in
this case.

>In summary, kernel may offer flexibility for different policies for
>use by different users. PMEM has different characteristics comparing
>to DRAM, users may or may not be treated differently than DRAM through
>policies.
>
>Thanks,
>Fengguang

  reply	other threads:[~2019-02-23 13:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-30 17:48 [LSF/MM TOPIC] memory reclaim with NUMA rebalancing Michal Hocko
2019-01-30 17:48 ` Michal Hocko
2019-01-30 18:12 ` Keith Busch
2019-01-30 18:12   ` Keith Busch
2019-01-30 23:53 ` Yang Shi
2019-01-30 23:53   ` Yang Shi
2019-01-30 23:53   ` Yang Shi
2019-01-31  6:49 ` [LSF/MM ATTEND ] " Aneesh Kumar K.V
2019-02-06 19:03   ` Christopher Lameter
2019-02-06 19:03     ` Christopher Lameter
2019-02-06 19:03     ` Christopher Lameter
2019-02-22 13:48     ` Jonathan Cameron
2019-02-22 13:48       ` Jonathan Cameron
2019-02-22 14:12     ` Larry Woodman
2019-02-22 14:12       ` Larry Woodman
2019-02-23 13:27   ` Fengguang Wu
2019-02-23 13:27     ` Fengguang Wu
2019-02-23 13:42     ` Fengguang Wu [this message]
2019-02-23 13:42       ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190223134226.spesmpw6qnnfyvrr@wfg-t540p.sh.intel.com \
    --to=fengguang.wu@intel.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.