From: Feng Tang <feng.tang@intel.com>
To: Michal Hocko <mhocko@suse.com>,
Dave Hansen <dave.hansen@intel.com>,
Ben Widawsky <ben.widawsky@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
David Rientjes <rientjes@google.com>,
Mel Gorman <mgorman@techsingularity.net>,
Mike Kravetz <mike.kravetz@oracle.com>,
Randy Dunlap <rdunlap@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
Dave Hansen <dave.hansen@intel.com>,
Ben Widawsky <ben.widawsky@intel.com>,
Andi Kleen <ak@linux.intel.com>,
Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH v4 03/13] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes
Date: Tue, 20 Apr 2021 15:16:25 +0800 [thread overview]
Message-ID: <20210420071625.GB48282@shbuild999.sh.intel.com> (raw)
In-Reply-To: <YHblLevoUZ6+AvVZ@dhcp22.suse.cz>
On Wed, Apr 14, 2021 at 02:50:53PM +0200, Michal Hocko wrote:
> On Wed 17-03-21 11:40:00, Feng Tang wrote:
> > From: Dave Hansen <dave.hansen@linux.intel.com>
> >
> > MPOL_PREFERRED honors only a single node set in the nodemask. Add the
> > bare define for a new mode which will allow more than one.
> >
> > The patch does all the plumbing without actually adding the new policy
> > type.
> >
> > v2:
> > Plumb most MPOL_PREFERRED_MANY without exposing UAPI (Ben)
> > Fixes for checkpatch (Ben)
> >
> > Link: https://lore.kernel.org/r/20200630212517.308045-4-ben.widawsky@intel.com
> > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > ---
> > mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 40 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 2b1e0e4..1228d8e 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -31,6 +31,9 @@
> > * but useful to set in a VMA when you have a non default
> > * process policy.
> > *
> > + * preferred many Try a set of nodes first before normal fallback. This is
> > + * similar to preferred without the special case.
> > + *
> > * default Allocate on the local node first, or when on a VMA
> > * use the process policy. This is what Linux always did
> > * in a NUMA aware kernel and still does by, ahem, default.
> > @@ -105,6 +108,8 @@
> >
> > #include "internal.h"
> >
> > +#define MPOL_PREFERRED_MANY MPOL_MAX
> > +
> > /* Internal flags */
> > #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0) /* Skip checks for continuous vmas */
> > #define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for nodemask */
> > @@ -175,7 +180,7 @@ struct mempolicy *get_task_policy(struct task_struct *p)
> > static const struct mempolicy_operations {
> > int (*create)(struct mempolicy *pol, const nodemask_t *nodes);
> > void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes);
> > -} mpol_ops[MPOL_MAX];
> > +} mpol_ops[MPOL_MAX + 1];
> >
> > static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
> > {
> > @@ -415,7 +420,7 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new)
> > mmap_write_unlock(mm);
> > }
> >
> > -static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
> > +static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = {
> > [MPOL_DEFAULT] = {
> > .rebind = mpol_rebind_default,
> > },
> > @@ -432,6 +437,10 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
> > .rebind = mpol_rebind_nodemask,
> > },
> > /* [MPOL_LOCAL] - see mpol_new() */
> > + [MPOL_PREFERRED_MANY] = {
> > + .create = NULL,
> > + .rebind = NULL,
> > + },
> > };
>
> I do get that you wanted to keep MPOL_PREFERRED_MANY unaccessible for
> the userspace but wouldn't it be much easier to simply check in two
> syscall entries rather than playing thise MAX+1 games which make the
> review more complicated than necessary?
I will check this way, and currently the user input paramter
handling are quite complex.
Also the sanity check in kernel_mbind() and kernel_set_mempolicy()
are almost identical, which can be unified.
> >
> > static int migrate_page_add(struct page *page, struct list_head *pagelist,
> > @@ -924,6 +933,9 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes)
> > case MPOL_INTERLEAVE:
> > *nodes = p->v.nodes;
> > break;
> > + case MPOL_PREFERRED_MANY:
> > + *nodes = p->v.preferred_nodes;
> > + break;
> > case MPOL_PREFERRED:
> > if (!(p->flags & MPOL_F_LOCAL))
> > *nodes = p->v.preferred_nodes;
>
> Why those two do a slightly different thing? Is this because unlike
> MPOL_PREFERRED it can never have MPOL_F_LOCAL cleared? If that is the
> case I would still stick the two together and use the same code for
> both to make the code easier to follow. Now that both use the same
> nodemask it should really be just about syscall inputs sanitization and
> to keep the original behavior for MPOL_PREFERRED.
>
> [...]
Our intention is to make MPOL_PREFERRED_MANY be similar to
MPOL_PREFERRED, except it perfers multiple nodes. So will try to
achieve this in following version.
Also for MPOL_LOCAL and MPOL_PREFERRED, current code logic is
turning 'MPOL_LOCAL' to 'MPOL_PREFERRED' with MPOL_F_LOCAL set.
I don't understand why not use the other way around, that
turning MPOL_PREFERRED with empty nodemask to MPOL_LOCAL, which
looks more logical.
Thanks,
Feng
> > @@ -2072,6 +2087,9 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
> > task_lock(current);
> > mempolicy = current->mempolicy;
> > switch (mempolicy->mode) {
> > + case MPOL_PREFERRED_MANY:
> > + *mask = mempolicy->v.preferred_nodes;
> > + break;
> > case MPOL_PREFERRED:
> > if (mempolicy->flags & MPOL_F_LOCAL)
> > nid = numa_node_id();
>
> Same here
next prev parent reply other threads:[~2021-04-20 7:16 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-17 3:39 [PATCH v4 00/13] Introduced multi-preference mempolicy Feng Tang
2021-03-17 3:39 ` [PATCH v4 01/13] mm/mempolicy: Add comment for missing LOCAL Feng Tang
2021-03-17 3:39 ` [PATCH v4 02/13] mm/mempolicy: convert single preferred_node to full nodemask Feng Tang
2021-04-14 12:17 ` Michal Hocko
2021-03-17 3:40 ` [PATCH v4 03/13] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Feng Tang
2021-04-14 12:50 ` Michal Hocko
2021-04-20 7:16 ` Feng Tang [this message]
2021-05-13 7:23 ` Feng Tang
2021-05-13 7:25 ` [RFC PATCH 2/2] mempolicy: kill MPOL_F_LOCAL bit Feng Tang
2021-05-13 13:55 ` Andi Kleen
2021-03-17 3:40 ` [PATCH v4 04/13] mm/mempolicy: allow preferred code to take a nodemask Feng Tang
2021-04-14 12:55 ` Michal Hocko
2021-04-19 8:49 ` Feng Tang
2021-03-17 3:40 ` [PATCH v4 05/13] mm/mempolicy: refactor rebind code for PREFERRED_MANY Feng Tang
2021-04-14 12:57 ` Michal Hocko
2021-03-17 3:40 ` [PATCH v4 06/13] mm/mempolicy: kill v.preferred_nodes Feng Tang
2021-04-14 12:58 ` Michal Hocko
2021-03-17 3:40 ` [PATCH v4 07/13] mm/mempolicy: handle MPOL_PREFERRED_MANY like BIND Feng Tang
2021-04-14 13:01 ` Michal Hocko
2021-03-17 3:40 ` [PATCH v4 08/13] mm/mempolicy: Create a page allocator for policy Feng Tang
2021-04-14 13:08 ` Michal Hocko
2021-04-15 8:17 ` Feng Tang
2021-03-17 3:40 ` [PATCH v4 09/13] mm/mempolicy: Thread allocation for many preferred Feng Tang
2021-03-17 3:40 ` [PATCH v4 10/13] mm/mempolicy: VMA " Feng Tang
2021-04-14 13:14 ` Michal Hocko
2021-03-17 3:40 ` [PATCH v4 11/13] mm/mempolicy: huge-page " Feng Tang
2021-03-17 7:19 ` kernel test robot
2021-04-14 13:25 ` Michal Hocko
2021-04-15 7:41 ` Feng Tang
2021-03-17 3:40 ` [PATCH v4 12/13] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY Feng Tang
2021-03-17 3:40 ` [PATCH v4 13/13] mem/mempolicy: unify mpol_new_preferred() and mpol_new_preferred_many() Feng Tang
2021-04-14 11:21 ` [PATCH v4 00/13] Introduced multi-preference mempolicy Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210420071625.GB48282@shbuild999.sh.intel.com \
--to=feng.tang@intel.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=ben.widawsky@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).