linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: akpm@linux-foundation.org, dan.j.williams@intel.com,
	pasha.tatashin@soleen.com, mhocko@suse.com,
	anshuman.khandual@arm.com, Jonathan.Cameron@huawei.com,
	vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type
Date: Fri, 26 Jul 2019 11:41:46 +0200	[thread overview]
Message-ID: <dbd19aea-fe18-ec42-7932-f03109cb399e@redhat.com> (raw)
In-Reply-To: <20190726092548.GA26268@linux>

On 26.07.19 11:25, Oscar Salvador wrote:
> On Fri, Jul 26, 2019 at 10:48:54AM +0200, David Hildenbrand wrote:
>>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
>>> ---
>>>  include/linux/mm.h         | 17 +++++++++++++++++
>>>  include/linux/mm_types.h   |  5 +++++
>>>  include/linux/page-flags.h | 19 +++++++++++++++++++
>>>  3 files changed, 41 insertions(+)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 45f0ab0ed4f7..432175f8f8d2 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -2904,6 +2904,23 @@ static inline bool debug_guardpage_enabled(void) { return false; }
>>>  static inline bool page_is_guard(struct page *page) { return false; }
>>>  #endif /* CONFIG_DEBUG_PAGEALLOC */
>>>  
>>> +static __always_inline struct page *vmemmap_head(struct page *page)
>>> +{
>>> +	return (struct page *)page->vmemmap_head;
>>> +}
>>> +
>>> +static __always_inline unsigned long vmemmap_nr_sections(struct page *page)
>>> +{
>>> +	struct page *head = vmemmap_head(page);
>>> +	return head->vmemmap_sections;
>>> +}
>>> +
>>> +static __always_inline unsigned long vmemmap_nr_pages(struct page *page)
>>> +{
>>> +	struct page *head = vmemmap_head(page);
>>> +	return head->vmemmap_pages - (page - head);
>>> +}
>>> +
>>>  #if MAX_NUMNODES > 1
>>>  void __init setup_nr_node_ids(void);
>>>  #else
>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>>> index 6a7a1083b6fb..51dd227f2a6b 100644
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -170,6 +170,11 @@ struct page {
>>>  			 * pmem backed DAX files are mapped.
>>>  			 */
>>>  		};
>>> +		struct {        /* Vmemmap pages */
>>> +			unsigned long vmemmap_head;
>>> +			unsigned long vmemmap_sections; /* Number of sections */
>>> +			unsigned long vmemmap_pages;    /* Number of pages */
>>> +		};
>>>  
>>>  		/** @rcu_head: You can use this to free a page by RCU. */
>>>  		struct rcu_head rcu_head;
>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>> index f91cb8898ff0..75f302a532f9 100644
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -708,6 +708,7 @@ PAGEFLAG_FALSE(DoubleMap)
>>>  #define PG_kmemcg	0x00000200
>>>  #define PG_table	0x00000400
>>>  #define PG_guard	0x00000800
>>> +#define PG_vmemmap     0x00001000
>>>  
>>>  #define PageType(page, flag)						\
>>>  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>>> @@ -764,6 +765,24 @@ PAGE_TYPE_OPS(Table, table)
>>>   */
>>>  PAGE_TYPE_OPS(Guard, guard)
>>>  
>>> +/*
>>> + * Vmemmap pages refers to those pages that are used to create the memmap
>>> + * array, and reside within the same memory range that was hotppluged, so
>>> + * they are self-hosted. (see include/linux/memory_hotplug.h)
>>> + */
>>> +PAGE_TYPE_OPS(Vmemmap, vmemmap)
>>> +static __always_inline void SetPageVmemmap(struct page *page)
>>> +{
>>> +	__SetPageVmemmap(page);
>>> +	__SetPageReserved(page);
>>
>> So, the issue with some vmemmap pages is that the "struct pages" reside
>> on the memory they manage. (it is nice, but complicated - e.g. when
>> onlining/offlining)
> 
> Hi David,
> 
> Not really.
> Vemmap pages are just skipped when onling/offlining handling.
> We do not need them to a) send to the buddy and b) migrate them over.
> A look at patch#4 will probably help, as the crux of the matter is there.

Right, but you have to hinder onlining code from trying to reinitialize
the vmemmap - when you try to online the first memory block. Will dive
into the details (patch #4) next (maybe not today, but early next week) :)

> 
>>
>> I would expect that you properly initialize the struct pages for the
>> vmemmap pages (now it gets confusing :) ) when adding memory. The other
>> struct pages are initialized when onlining/offlining.
>>
>> So, at this point, the pages should already be marked reserved, no? Or
>> are the struct pages for the vmemmap never initialized?
>>
>> What zone do these vmemmap pages have? They are not assigned to any zone
>> and will never be :/
> 
> This patch is only a preparation, the real "fun" is in patch#4.
> 
> Vmemmap pages initialization occurs in mhp_mark_vmemmap_pages, called from
> __add_pages() (patch#4).
> In there we a) mark the page as vmemmap and b) initialize the fields we need to
> track some medata (sections, pages and head).
> 
> In __init_single_page(), when onlining, the rest of the fields will be set up
> properly (zone, refcount, etc).
> 
> Chunk from patch#4:
> 
> static void __meminit __init_single_page(struct page *page, unsigned long pfn,
>                                 unsigned long zone, int nid)
> {
>         if (PageVmemmap(page))
>                 /*
>                  * Vmemmap pages need to preserve their state.
>                  */
>                 goto preserve_state;

Can you be sure there are no false positives? (if I remember correctly,
this memory might be completely uninitialized - I might be wrong)

> 
>         mm_zero_struct_page(page);
>         page_mapcount_reset(page);
>         INIT_LIST_HEAD(&page->lru);
> preserve_state:
>         init_page_count(page);
>         set_page_links(page, zone, nid, pfn);
>         page_cpupid_reset_last(page);
>         page_kasan_tag_reset(page);
> 
> So, vmemmap pages will fall within the same zone as the range we are adding,
> that does not change.

I wonder if that is the right thing to do, hmmmm, because they are
effectively not part of that zone (not online)

Will have a look at the details :)

> 
>>> +}
>>> +
>>> +static __always_inline void ClearPageVmemmap(struct page *page)
>>> +{
>>> +	__ClearPageVmemmap(page);
>>> +	__ClearPageReserved(page);
>>
>> You sure you want to clear the reserved flag here? Is this function
>> really needed?
>>
>> (when you add memory, you can mark all relevant pages as vmemmap pages,
>> which is valid until removing the memory)
>>
>> Let's draw a picture so I am not confused
>>
>> [ ------ added memory ------ ]
>> [ vmemmap]
>>
>> The first page of the added memory is a vmemmap page AND contains its
>> own vmemmap, right?
> 
> Not only the first page.
> Depending on how large is the chunk you are adding, the number of vmemmap
> pages will vary, because we need to cover the memmaps for the range.
> 
> e.g:
> 
>  - 128MB (1 section) = 512 vmemmap pages at the beginning of the range
>  - 256MB (2 section) = 1024 vmemmap pages at the beginning of the range
>  ...
> 

Right.

>> When adding memory, you would initialize set all struct pages of the
>> vmemmap (residing on itself) and set them to SetPageVmemmap().
>>
>> When removing memory, there is nothing to do, all struct pages are
>> dropped. So why do we need the ClearPageVmemmap() ?
> 
> Well, it is not really needed as we only call ClearPageVmemmap when we are
> actually removing the memory with vmemmap_free()->...
> So one could argue that since the memory is going away, there is no need
> to clear anything in there.
> 
> I just made it for consistency purposes.
> 
> Can drop it if feeling strong here.

Not strong, was just wondering why that is needed at all in the big
picture :)

-- 

Thanks,

David / dhildenb

  reply	other threads:[~2019-07-26  9:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
2019-07-25 16:02 ` [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY Oscar Salvador
2019-07-26  8:34   ` David Hildenbrand
2019-07-26  9:29     ` Oscar Salvador
2019-07-26  9:37       ` David Hildenbrand
2019-07-25 16:02 ` [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type Oscar Salvador
2019-07-26  8:48   ` David Hildenbrand
2019-07-26  9:25     ` Oscar Salvador
2019-07-26  9:41       ` David Hildenbrand [this message]
2019-07-26 10:11         ` Oscar Salvador
2019-07-25 16:02 ` [PATCH v3 3/5] mm,sparse: Add SECTION_USE_VMEMMAP flag Oscar Salvador
2019-08-01 14:45   ` David Hildenbrand
2019-07-25 16:02 ` [PATCH v3 4/5] mm,memory_hotplug: Allocate memmap from the added memory range for sparse-vmemmap Oscar Salvador
2019-08-01 15:04   ` David Hildenbrand
2019-07-25 16:02 ` [PATCH v3 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap Oscar Salvador
2019-08-01 15:07   ` David Hildenbrand
2019-07-25 16:56 ` [PATCH v3 0/5] Allocate memmap from hotadded memory David Hildenbrand
2019-08-01  7:39 ` Oscar Salvador
2019-08-01  8:17   ` David Hildenbrand
2019-08-01  8:39     ` Oscar Salvador
2019-08-01  8:44       ` David Hildenbrand
2019-08-01 18:46 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dbd19aea-fe18-ec42-7932-f03109cb399e@redhat.com \
    --to=david@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).