All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, vbabka@suse.cz,
	minchan@kernel.org, aneesh.kumar@linux.vnet.ibm.com,
	bsingharora@gmail.com, srikar@linux.vnet.ibm.com,
	haren@linux.vnet.ibm.com, jglisse@redhat.com,
	dave.hansen@intel.com, dan.j.williams@intel.com
Subject: Re: [PATCH V3 0/4] Define coherent device memory node
Date: Wed, 22 Feb 2017 10:29:21 +0100	[thread overview]
Message-ID: <20170222092921.GF5753@dhcp22.suse.cz> (raw)
In-Reply-To: <697214d2-9e75-1b37-0922-68c413f96ef9@linux.vnet.ibm.com>

On Tue 21-02-17 18:39:17, Anshuman Khandual wrote:
> On 02/17/2017 07:02 PM, Mel Gorman wrote:
[...]
> > Why can this not be expressed with cpusets and memory policies
> > controlled by a combination of administrative steps for a privileged
> > application and an application that is CDM aware?
> 
> Hmm, that can be done but having an in kernel infrastructure has the
> following benefits.
> 
> * Administrator does not have to listen to node add notifications
>   and keep the isolation/allowed cpusets upto date all the time.
>   This can be a significant overhead on the admin/userspace which
>   have a number of separate device memory nodes.

But the application has to communicate with the device so why it cannot
use a device specific allocation as well? I really fail to see why
something this special should hide behind a generic API to spread all
the special casing into the kernel instead.
 
> * With cpuset solution, tasks which are part of CDM allowed cpuset
>   can have all it's VMAs allocate from CDM memory which may not be
>   something the user want. For example user may not want to have
>   the text segments, libraries allocate from CDM. To achieve this
>   the user will have to explicitly block allocation access from CDM
>   through mbind(MPOL_BIND) memory policy setups. This negative setup
>   is a big overhead. But with in kernel CDM framework, isolation is
>   enabled by default. For CDM allocations the application just has
>   to setup memory policy with CDM node in the allowed nodemask.

Which makes cpusets vs. mempolicies even bigger mess, doesn't it? So say
that you have an application which wants to benefit from CDM and use
mbind to have an access to this memory for particular buffer. Now you
try to run this application in a cpuset which doesn't include this node
and now what? Cpuset will override the application policy so the buffer
will never reach the requested node. At least not without even more
hacks to cpuset handling. I really do not like that!

[...]
> These are the reasons which prohibit the use of HMM for coherent
> addressable device memory purpose.
> 
[...]
> (3) Application cannot directly allocate into device memory from user
> space using existing memory related system calls like mmap() and mbind()
> as the device memory hides away in ZONE_DEVICE.

Why cannot the application simply use mmap on the device file?

> Apart from that, CDM framework provides a different approach to device
> memory representation which does not require special device memory kind
> of handling and associated call backs as implemented by HMM. It provides
> NUMA node based visibility to the user space which can be extended to
> support new features.

What do you mean by new features and how users will use/request those
features (aka what is the API)?
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, vbabka@suse.cz,
	minchan@kernel.org, aneesh.kumar@linux.vnet.ibm.com,
	bsingharora@gmail.com, srikar@linux.vnet.ibm.com,
	haren@linux.vnet.ibm.com, jglisse@redhat.com,
	dave.hansen@intel.com, dan.j.williams@intel.com
Subject: Re: [PATCH V3 0/4] Define coherent device memory node
Date: Wed, 22 Feb 2017 10:29:21 +0100	[thread overview]
Message-ID: <20170222092921.GF5753@dhcp22.suse.cz> (raw)
In-Reply-To: <697214d2-9e75-1b37-0922-68c413f96ef9@linux.vnet.ibm.com>

On Tue 21-02-17 18:39:17, Anshuman Khandual wrote:
> On 02/17/2017 07:02 PM, Mel Gorman wrote:
[...]
> > Why can this not be expressed with cpusets and memory policies
> > controlled by a combination of administrative steps for a privileged
> > application and an application that is CDM aware?
> 
> Hmm, that can be done but having an in kernel infrastructure has the
> following benefits.
> 
> * Administrator does not have to listen to node add notifications
>   and keep the isolation/allowed cpusets upto date all the time.
>   This can be a significant overhead on the admin/userspace which
>   have a number of separate device memory nodes.

But the application has to communicate with the device so why it cannot
use a device specific allocation as well? I really fail to see why
something this special should hide behind a generic API to spread all
the special casing into the kernel instead.
 
> * With cpuset solution, tasks which are part of CDM allowed cpuset
>   can have all it's VMAs allocate from CDM memory which may not be
>   something the user want. For example user may not want to have
>   the text segments, libraries allocate from CDM. To achieve this
>   the user will have to explicitly block allocation access from CDM
>   through mbind(MPOL_BIND) memory policy setups. This negative setup
>   is a big overhead. But with in kernel CDM framework, isolation is
>   enabled by default. For CDM allocations the application just has
>   to setup memory policy with CDM node in the allowed nodemask.

Which makes cpusets vs. mempolicies even bigger mess, doesn't it? So say
that you have an application which wants to benefit from CDM and use
mbind to have an access to this memory for particular buffer. Now you
try to run this application in a cpuset which doesn't include this node
and now what? Cpuset will override the application policy so the buffer
will never reach the requested node. At least not without even more
hacks to cpuset handling. I really do not like that!

[...]
> These are the reasons which prohibit the use of HMM for coherent
> addressable device memory purpose.
> 
[...]
> (3) Application cannot directly allocate into device memory from user
> space using existing memory related system calls like mmap() and mbind()
> as the device memory hides away in ZONE_DEVICE.

Why cannot the application simply use mmap on the device file?

> Apart from that, CDM framework provides a different approach to device
> memory representation which does not require special device memory kind
> of handling and associated call backs as implemented by HMM. It provides
> NUMA node based visibility to the user space which can be extended to
> support new features.

What do you mean by new features and how users will use/request those
features (aka what is the API)?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-02-22  9:29 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-15 12:07 [PATCH V3 0/4] Define coherent device memory node Anshuman Khandual
2017-02-15 12:07 ` Anshuman Khandual
2017-02-15 12:07 ` [PATCH V3 1/4] mm: Define coherent device memory (CDM) node Anshuman Khandual
2017-02-15 12:07   ` Anshuman Khandual
2017-02-17 14:05   ` Bob Liu
2017-02-17 14:05     ` Bob Liu
2017-02-21 10:20     ` Anshuman Khandual
2017-02-21 10:20       ` Anshuman Khandual
2017-02-15 12:07 ` [PATCH V3 2/4] mm: Enable HugeTLB allocation isolation for CDM nodes Anshuman Khandual
2017-02-15 12:07   ` Anshuman Khandual
2017-02-15 12:07 ` [PATCH V3 3/4] mm: Add new parameter to get_page_from_freelist() function Anshuman Khandual
2017-02-15 12:07   ` Anshuman Khandual
2017-02-15 12:07 ` [PATCH V3 4/4] mm: Enable Buddy allocation isolation for CDM nodes Anshuman Khandual
2017-02-15 12:07   ` Anshuman Khandual
2017-02-15 18:20 ` [PATCH V3 0/4] Define coherent device memory node Mel Gorman
2017-02-15 18:20   ` Mel Gorman
2017-02-16 22:14   ` Balbir Singh
2017-02-16 22:14     ` Balbir Singh
2017-02-17  9:33     ` Mel Gorman
2017-02-17  9:33       ` Mel Gorman
2017-02-21  2:57       ` Balbir Singh
2017-02-21  2:57         ` Balbir Singh
2017-03-01  2:42         ` Balbir Singh
2017-03-01  2:42           ` Balbir Singh
2017-03-01  9:55           ` Mel Gorman
2017-03-01  9:55             ` Mel Gorman
2017-03-01 10:59             ` Balbir Singh
2017-03-01 10:59               ` Balbir Singh
2017-03-08  9:04               ` Anshuman Khandual
2017-03-08  9:04                 ` Anshuman Khandual
2017-03-08  9:21                 ` [PATCH 1/2] mm: Change generic FALLBACK zonelist creation process Anshuman Khandual
2017-03-08  9:21                   ` Anshuman Khandual
2017-03-08 11:07                   ` John Hubbard
2017-03-08 11:07                     ` John Hubbard
2017-03-14 13:33                     ` Anshuman Khandual
2017-03-14 13:33                       ` Anshuman Khandual
2017-03-15  4:10                       ` John Hubbard
2017-03-15  4:10                         ` John Hubbard
2017-03-08  9:21                 ` [PATCH 2/2] mm: Change mbind(MPOL_BIND) implementation for CDM nodes Anshuman Khandual
2017-03-08  9:21                   ` Anshuman Khandual
2017-02-17 11:41   ` [PATCH V3 0/4] Define coherent device memory node Anshuman Khandual
2017-02-17 11:41     ` Anshuman Khandual
2017-02-17 13:32     ` Mel Gorman
2017-02-17 13:32       ` Mel Gorman
2017-02-21 13:09       ` Anshuman Khandual
2017-02-21 13:09         ` Anshuman Khandual
2017-02-21 20:14         ` Jerome Glisse
2017-02-21 20:14           ` Jerome Glisse
2017-02-23  8:14           ` Anshuman Khandual
2017-02-23  8:14             ` Anshuman Khandual
2017-02-23 15:27             ` Jerome Glisse
2017-02-23 15:27               ` Jerome Glisse
2017-02-22  9:29         ` Michal Hocko [this message]
2017-02-22  9:29           ` Michal Hocko
2017-02-22 14:59           ` Jerome Glisse
2017-02-22 14:59             ` Jerome Glisse
2017-02-22 16:54             ` Michal Hocko
2017-02-22 16:54               ` Michal Hocko
2017-03-06  5:48               ` Anshuman Khandual
2017-03-06  5:48                 ` Anshuman Khandual
2017-02-23  8:52           ` Anshuman Khandual
2017-02-23  8:52             ` Anshuman Khandual
2017-02-23 15:57         ` Mel Gorman
2017-02-23 15:57           ` Mel Gorman
2017-03-06  5:12           ` Anshuman Khandual
2017-03-06  5:12             ` Anshuman Khandual
2017-02-21 11:11     ` Michal Hocko
2017-02-21 11:11       ` Michal Hocko
2017-02-21 13:39       ` Anshuman Khandual
2017-02-21 13:39         ` Anshuman Khandual
2017-02-22  9:50         ` Michal Hocko
2017-02-22  9:50           ` Michal Hocko
2017-02-23  6:52           ` Anshuman Khandual
2017-02-23  6:52             ` Anshuman Khandual
2017-03-05 12:39             ` Anshuman Khandual
2017-03-05 12:39               ` Anshuman Khandual
2017-02-24  1:06         ` Bob Liu
2017-02-24  1:06           ` Bob Liu
2017-02-24  4:39           ` John Hubbard
2017-02-24  4:39             ` John Hubbard
2017-02-24  4:53           ` Jerome Glisse
2017-02-24  4:53             ` Jerome Glisse
2017-02-27  1:56             ` Bob Liu
2017-02-27  1:56               ` Bob Liu
2017-02-27  5:41               ` Anshuman Khandual
2017-02-27  5:41                 ` Anshuman Khandual

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170222092921.GF5753@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=haren@linux.vnet.ibm.com \
    --cc=jglisse@redhat.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.