All of lore.kernel.org
 help / color / mirror / Atom feed
From: Huaisheng HS1 Ye <yehs1@lenovo.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
	"pasha.tatashin@oracle.com" <pasha.tatashin@oracle.com>,
	"alexander.levin@verizon.com" <alexander.levin@verizon.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"penguin-kernel@I-love.SAKURA.ne.jp"
	<penguin-kernel@I-love.SAKURA.ne.jp>,
	"colyli@suse.de" <colyli@suse.de>,
	NingTing Cheng <chengnt@lenovo.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [External]  Re: [PATCH 2/3] include/linux/gfp.h: use unsigned int in gfp_zone
Date: Tue, 8 May 2018 00:25:31 +0000	[thread overview]
Message-ID: <HK2PR03MB1684F0E4DFF0184486BB285B929A0@HK2PR03MB1684.apcprd03.prod.outlook.com> (raw)
In-Reply-To: <20180507184410.GA12361@bombadil.infradead.org>


> On Mon, May 07, 2018 at 05:16:50PM +0000, Huaisheng HS1 Ye wrote:
> > I hope it couldn't cause problem, but based on my analyzation it has the
> potential to go wrong if users still use the flags as usual, which are __GFP_DMA,
> __GFP_DMA32 and __GFP_HIGHMEM.
> > Let me take an example with my testing platform, these logics are much
> abstract, an example will be helpful.
> >
> > There is a two sockets X86_64 server, No HIGHMEM and it has 16 + 16GB
> memories.
> > Its zone types shall be like this below,
> >
> > ZONE_DMA		0		0b0000
> > ZONE_DMA32		1		0b0001
> > ZONE_NORMAL		2		0b0010
> > (OPT_ZONE_HIGHMEM)	2		0b0010
> > ZONE_MOVABLE		3		0b0011
> > ZONE_DEVICE		4		0b0100 (virtual zone)
> > __MAX_NR_ZONES	5
> >
> > __GFP_DMA	= ZONE_DMA ^ ZONE_NORMAL= 0b0010
> > __GFP_DMA32	= ZONE_DMA32 ^ ZONE_NORMAL= 0b0011
> > __GFP_HIGHMEM = OPT_ZONE_HIGHMEM ^ ZONE_NORMAL = 0b0000
> > __GFP_MOVABLE	= ZONE_MOVABLE ^ ZONE_NORMAL |
> ___GFP_MOVABLE = 0b1001
> >
> > Eg.
> > If a driver uses flags like this below,
> > Step 1:
> > gfp_mask  |  __GFP_DMA32;
> > (0b 0000		|	0b 0011	= 0b 0011)
> > gfp_mask's low four bits shall equal to 0011, assuming no __GFP_MOVABLE
> >
> > Step 2:
> > gfp_mask  & ~__GFP_DMA;
> > (0b 0011	 & ~0b0010   = 0b0001)
> > gfp_mask's low four bits shall equal to 0001 now, then when it enter
> gfp_zone(),
> >
> > return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL;
> > (0b0001 ^ 0b0010 = 0b0011)
> > You know 0011 means that ZONE_MOVABLE will be returned.
> > In this case, error can be found, because gfp_mask needs to get
> ZONE_DMA32 originally.
> > But with existing GFP_ZONE_TABLE/BAD, it is correct. Because the bits are
> way of 0x1, 0x2, 0x4, 0x8
> 
> Yes, I understand your point here.  My point was that this was already a bug;
> the caller shouldn't simply be clearing __GFP_DMA; they really mean to clear
> all of the GFP_ZONE bits so that they allocate from ZONE_NORMAL.  And for
> that, they should be using ~GFP_ZONEMASK
That is great, if they can follow this principle, I don't worry it. Maybe I am too cautious.

> 
> Unless they already know, of course.  For example, this one in
> arch/x86/mm/pgtable.c is fine:
> 
>         if (strcmp(arg, "nohigh") == 0)
>                 __userpte_alloc_gfp &= ~__GFP_HIGHMEM;
> 
> because it knows that __userpte_alloc_gfp can only have __GFP_HIGHMEM set.
> 
> But something like btrfs should almost certainly be using ~GFP_ZONEMASK.


> > > +#define __GFP_HIGHMEM  ((__force gfp_t)OPT_ZONE_HIGHMEM ^
> > > ZONE_NORMAL)
> > > -#define __GFP_MOVABLE  ((__force gfp_t)___GFP_MOVABLE)  /*
> > > ZONE_MOVABLE allowed */
> > > +#define __GFP_MOVABLE  ((__force gfp_t)ZONE_MOVABLE ^
> > > ZONE_NORMAL | \
> > > +					___GFP_MOVABLE)
> > >
> > > Then I think you can just make it:
> > >
> > > static inline enum zone_type gfp_zone(gfp_t flags)
> > > {
> > > 	return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL;
> > > }
> > Sorry, I think it has risk in this way, let me introduce a failure case for
> example.
> >
> > Now suppose that, there is a flag should represent DMA flag with movable.
> > It should be like this below,
> > __GFP_DMA | __GFP_MOVABLE
> > (0b 0010       |   0b 1001   = 0b 1011)
> > Normally, gfp_zone shall return ZONE_DMA but with MOVABLE policy, right?
> 
> No, if you somehow end up with __GFP_MOVABLE | __GFP_DMA, it should give
> you ZONE_DMA.
Exactly, it should return ZONE_DMA, that's what I thought.

> 
> > But with your code, gfp_zone will return ZONE_DMA32 with MOVABLE
> >policy.
> > (0b 1011  ^  0b 0010 = 1001)
> 
> ___GFP_ZONE_MASK is 0x7, so it excludes __GFP_MOVABLE.
Sorry, I made a mistake here. I rewrite it as below.

((__GFP_DMA | __GFP_MOVABLE) & ___GFP_ZONE_MASK)
   ((0b 0010  |  0b 1001  = 0b 1011) & 0b 0111)	= 0b 0011

0b 0011 ^ 0b 0010 = 0b 0001
So ZONE_DMA32 will be returned, but what user needs is ZONE_DMA.

Thanks,
Huaisheng

      parent reply	other threads:[~2018-05-08  0:25 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-04  6:52 [PATCH 0/3] Some fixes for mm code optimization Huaisheng Ye
     [not found] ` <1525416729-108201-4-git-send-email-yehs1@lenovo.com>
2018-05-04 13:18   ` [PATCH 3/3] mm/page_alloc: Fix typo in debug info of calculate_node_totalpages Michal Hocko
2018-05-05  2:10     ` [External] " Huaisheng HS1 Ye
2018-05-09  7:48       ` Michal Hocko
     [not found] ` <1525416729-108201-3-git-send-email-yehs1@lenovo.com>
2018-05-04 13:35   ` [PATCH 2/3] include/linux/gfp.h: use unsigned int in gfp_zone Michal Hocko
2018-05-04 15:40     ` Matthew Wilcox
2018-05-04 17:50       ` [External] " Huaisheng HS1 Ye
2018-05-06  9:32       ` Huaisheng HS1 Ye
2018-05-06 13:48         ` Matthew Wilcox
2018-05-06 16:17           ` Huaisheng HS1 Ye
2018-05-06 18:55             ` Matthew Wilcox
2018-05-07 17:16               ` Huaisheng HS1 Ye
2018-05-07 18:44                 ` Matthew Wilcox
2018-05-07 21:25                   ` David Sterba
2018-05-08  0:25                     ` Matthew Wilcox
2018-05-09  9:36                       ` David Sterba
2018-05-15 11:54                         ` Matthew Wilcox
2018-05-21 17:06                           ` David Sterba
2018-05-09 14:57                     ` Huaisheng HS1 Ye
2018-05-08  0:25                   ` Huaisheng HS1 Ye [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=HK2PR03MB1684F0E4DFF0184486BB285B929A0@HK2PR03MB1684.apcprd03.prod.outlook.com \
    --to=yehs1@lenovo.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.levin@verizon.com \
    --cc=chengnt@lenovo.com \
    --cc=colyli@suse.de \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=pasha.tatashin@oracle.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.