linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Wang Wensheng <wangwensheng4@huawei.com>,
	David Hildenbrand <david@redhat.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"osalvador@suse.de" <osalvador@suse.de>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"rui.xiang@huawei.com" <rui.xiang@huawei.com>,
	"HAGIO KAZUHITO(萩尾 一仁)" <k-hagio-ab@nec.com>
Subject: Re: [PATCH] mm/sparse: Fix flags overlap in section_mem_map
Date: Wed, 23 Jun 2021 23:09:40 +0000	[thread overview]
Message-ID: <20210623230939.GA2963480@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <e838d8b5-84f1-5532-6f22-e4b729124e1c@redhat.com>

On Tue, Apr 27, 2021 at 11:05:17AM +0200, David Hildenbrand wrote:
> On 27.04.21 10:30, Wang Wensheng wrote:
> > The section_mem_map member of struct mem_section stores some flags and
> > the address of struct page array of the mem_section.
> > 
> > Additionally the node id of the mem_section is stored during early boot,
> > where the struct page array has not been allocated. In other words, the
> > higher bits of section_mem_map are used for two purpose, and the node id
> > should be clear properly after the early boot.
> > 
> > Currently the node id field is overlapped with the flag field and cannot
> > be clear properly. That overlapped bits would then be treated as
> > mem_section flags and may lead to unexpected side effects.
> > 
> > Define SECTION_NID_SHIFT using order_base_2 to ensure that the node id
> > field always locates after flags field. That's why the overlap occurs -
> > forgetting to increase SECTION_NID_SHIFT when adding new mem_section
> > flag.
> > 
> > Fixes: 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY flag")
> > Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
> > ---
> >   include/linux/mmzone.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 47946ce..b01694d 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1325,7 +1325,7 @@ extern size_t mem_section_usage_size(void);
> >   #define SECTION_TAINT_ZONE_DEVICE	(1UL<<4)
> >   #define SECTION_MAP_LAST_BIT		(1UL<<5)
> >   #define SECTION_MAP_MASK		(~(SECTION_MAP_LAST_BIT-1))
> > -#define SECTION_NID_SHIFT		3
> > +#define SECTION_NID_SHIFT		order_base_2(SECTION_MAP_LAST_BIT)
> >   static inline struct page *__section_mem_map_addr(struct mem_section *section)
> >   {
> > 
> 
> Well, all sections around during boot that have an early NID are early ...
> so it's not an issue with SECTION_IS_EARLY, no? I mean, it's ugly, but not
> broken.
> 
> But it's an issue with SECTION_TAINT_ZONE_DEVICE, AFAIKT.
> sparse_init_one_section() would leave the bit set if the nid happens to have
> that bit set (e.g., node 2,3). It's semi-broken then, because we force all
> pfn_to_online_page() through the slow path.
> 
> 
> That whole section flag setting code is fragile.

Hi Wensheng, David,

This patch seems not accepted or updated yet, so what's going on?

We recently saw the exact issue on testing crash utilities with latest
kernels on 4 NUMA node system.  SECTION_TAINT_ZONE_DEVICE seems to be
set wrongly, and makedumpfile could fail due to this. So we need a fix.

I thought of another approach like below before finding this thread,
so if you are fine, I'd like to submit a patch with this. And if you
like going with order_base_2() approach, I'm glad to ack this patch.

  --- a/include/linux/mmzone.h
  +++ b/include/linux/mmzone.h
  @@ -1358,14 +1358,15 @@ extern size_t mem_section_usage_size(void);
    *      which results in PFN_SECTION_SHIFT equal 6.
    * To sum it up, at least 6 bits are available.
    */
  +#define SECTION_MAP_LAST_SHIFT         5
   #define SECTION_MARKED_PRESENT         (1UL<<0)
   #define SECTION_HAS_MEM_MAP            (1UL<<1)
   #define SECTION_IS_ONLINE              (1UL<<2)
   #define SECTION_IS_EARLY               (1UL<<3)
   #define SECTION_TAINT_ZONE_DEVICE      (1UL<<4)
  -#define SECTION_MAP_LAST_BIT           (1UL<<5)
  +#define SECTION_MAP_LAST_BIT           (1UL<<SECTION_MAP_LAST_SHIFT)
   #define SECTION_MAP_MASK               (~(SECTION_MAP_LAST_BIT-1))
  -#define SECTION_NID_SHIFT              3
  +#define SECTION_NID_SHIFT              SECTION_MAP_LAST_SHIFT
  
   static inline struct page *__section_mem_map_addr(struct mem_section *section)
   {

Thanks,
Naoya Horiguchi

  reply	other threads:[~2021-06-23 23:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27  8:30 [PATCH] mm/sparse: Fix flags overlap in section_mem_map Wang Wensheng
2021-04-27  9:05 ` David Hildenbrand
2021-06-23 23:09   ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-06-25 21:23     ` Dan Williams
2021-06-28  7:06       ` [PATCH] mm/sparse: set SECTION_NID_SHIFT to 6 Naoya Horiguchi
2021-07-06  8:36         ` David Hildenbrand
2021-07-07  4:54           ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210623230939.GA2963480@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=k-hagio-ab@nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=rui.xiang@huawei.com \
    --cc=wangwensheng4@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).