linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@suse.com>, Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 1/5] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Fri, 26 Mar 2021 09:52:58 +0100	[thread overview]
Message-ID: <5be95091-b4ac-8e05-4694-ac5c65f790a4@redhat.com> (raw)
In-Reply-To: <YF2ct/UZUBG1GcM3@dhcp22.suse.cz>

On 26.03.21 09:35, Michal Hocko wrote:
> On Thu 25-03-21 23:06:50, Oscar Salvador wrote:
>> On Thu, Mar 25, 2021 at 05:47:30PM +0100, Michal Hocko wrote:
>>> On Thu 25-03-21 17:36:22, Michal Hocko wrote:
>>>> If all it takes is to make pfn_to_online_page work (and my
>>>> previous attempt is incorrect because it should consult block rather
>>>> than section pfn range)
>>>
>>> This should work.
>>
>> Sorry, but while this solves some of the issues with that approach, I really
>> think that overcomplicates things and buys us not so much in return.
>> To me it seems that we are just patching things to make it work that
>> way.
> 
> I do agree that special casing vmemmap areas is something I do not
> really like but we are in that schrödinger situation when this memory is
> not onlineable unless it shares memory section with an onlineable
> memory. There are two ways around that, either special case it on
> pfn_to_online_page or mark the vmemmap section online even though it is
> not really.
> 
>> To be honest, I dislike this, and I guess we can only agree to disagree
>> here.
> 
> No problem there. I will not insist on my approach unless I can convince
> you that it is a better solution. It seems I have failed and I can live
> with that.
> 
>> I find the following much easier, cleaner, and less risky to encounter
>> pitfalls in the future:
>>
>> (!!!It is untested and incomplete, and I would be surprised if it even
>> compiles, but it is enough as a PoC !!!)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 5ea2b3fbce02..799d14fc2f9b 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -169,6 +169,60 @@ int memory_notify(unsigned long val, void *v)
>>   	return blocking_notifier_call_chain(&memory_chain, val, v);
>>   }
>>
>> +static int memory_block_online(unsigned long start_pfn, unsigned long nr_pages,
>> +			       unsigned long nr_vmemmap_pages, int online_type,
>> +			       int nid)
>> +{
>> +	int ret;
>> +	/*
>> +	 * Despite vmemmap pages having a different lifecycle than the pages
>> +	 * they describe, initialiating and accounting vmemmap pages at the
>> +	 * online/offline stage eases things a lot.
> 
> This requires quite some explaining.
> 
>> +	 * We do it out of {online,offline}_pages, so those routines only have
>> +	 * to deal with pages that are actual usable memory.
>> +	 */
>> +	if (nr_vmemmap_pages) {
>> +		struct zone *z;
>> +
>> +		z = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
>> +		move_pfn_range_to_zone(z, start_pfn, nr_vmemmap_pages, NULL,
>> +				       MIGRATE_UNMOVABLE);
>> +		/*
>> +		 * The below could go to a helper to make it less bulky here,
>> +		 * so {online,offline}_pages could also use it.
>> +		 */
>> +		z->present_pages += nr_pages;
>> +		pgdat_resize_lock(z->zone_pgdat, &flags);
>> +		z->zone_pgdat->node_present_pages += nr_pages;
>> +		pgdat_resize_unlock(z->zone_pgdat, &flags);

Might have to set fully spanned section online. (vmemmap >= SECTION_SIZE)

>> +	}
>> +
>> +	ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages,
>> +			   online_type);
>> +
>> +	/*
>> +	 * In case online_pages() failed for some reason, we should cleanup vmemmap
>> +	 * accounting as well.
>> +	 */
>> +	return ret;
>> +}
> 
> Yes this is much better! Just a minor suggestion would be to push
> memory_block all the way to memory_block_online (it oline a memory
> block). I would also slightly prefer to provide 2 helpers that would make
> it clear that this is to reserve/cleanup the vmemamp space (defined in
> the memory_hotplug proper).
> 
> Thanks!
> 

Something else to note:


We'll not call the memory notifier (e.g., MEM_ONLINE) for the vmemmap. 
The result is that

1. We won't allocate extended struct pages for the range. Don't think 
this is really problematic (pages are never allocated/freed, so I guess 
we don't care - like ZONE_DEVICE code).

2. We won't allocate kasan shadow memory. We most probably have to do it 
explicitly via kasan_add_zero_shadow()/kasan_remove_zero_shadow(), see 
mm/memremap.c:pagemap_range()


Further a locking rework might be necessary. We hold the device hotplug 
lock, but not the memory hotplug lock. E.g., for get_online_mems(). 
Might have to move that out online_pages.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-03-26  8:53 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19  9:26 [PATCH v5 0/5] Allocate memmap from hotadded memory (per device) Oscar Salvador
2021-03-19  9:26 ` [PATCH v5 1/5] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2021-03-19 10:20   ` David Hildenbrand
2021-03-19 10:31     ` Oscar Salvador
2021-03-19 12:04       ` David Hildenbrand
2021-03-23 10:11   ` Michal Hocko
2021-03-24 10:12     ` Oscar Salvador
2021-03-24 12:03       ` Michal Hocko
2021-03-24 12:10         ` Michal Hocko
2021-03-24 12:23           ` David Hildenbrand
2021-03-24 12:37             ` Michal Hocko
2021-03-24 13:13               ` David Hildenbrand
2021-03-24 13:40                 ` Michal Hocko
2021-03-24 14:05                   ` David Hildenbrand
2021-03-24 13:27         ` Oscar Salvador
2021-03-24 14:42         ` Michal Hocko
2021-03-24 14:52           ` David Hildenbrand
2021-03-24 16:04             ` Michal Hocko
2021-03-24 19:16               ` David Hildenbrand
2021-03-25  8:07                 ` Oscar Salvador
2021-03-25  9:17                   ` Michal Hocko
2021-03-25 10:55                     ` Oscar Salvador
2021-03-25 11:08                       ` David Hildenbrand
2021-03-25 11:23                         ` Oscar Salvador
2021-03-25 12:35                         ` Michal Hocko
2021-03-25 12:40                           ` David Hildenbrand
2021-03-25 14:08                             ` Michal Hocko
2021-03-25 14:09                               ` David Hildenbrand
2021-03-25 14:34                                 ` Michal Hocko
2021-03-25 14:46                                   ` David Hildenbrand
2021-03-25 15:12                                     ` Michal Hocko
2021-03-25 15:19                                       ` David Hildenbrand
2021-03-25 15:35                                         ` Michal Hocko
2021-03-25 15:40                                           ` David Hildenbrand
2021-03-25 16:07                                           ` Michal Hocko
2021-03-25 16:20                                             ` David Hildenbrand
2021-03-25 16:36                                               ` Michal Hocko
2021-03-25 16:47                                                 ` Michal Hocko
2021-03-25 16:55                                                   ` David Hildenbrand
2021-03-25 22:06                                                   ` Oscar Salvador
2021-03-26  8:35                                                     ` Michal Hocko
2021-03-26  8:52                                                       ` David Hildenbrand [this message]
2021-03-26  8:57                                                         ` Oscar Salvador
2021-03-26 12:15                                                           ` Oscar Salvador
2021-03-26 13:36                                                             ` David Hildenbrand
2021-03-26 14:38                                                         ` Michal Hocko
2021-03-26 14:53                                                           ` David Hildenbrand
2021-03-26 15:31                                                             ` Michal Hocko
2021-03-26 16:03                                                               ` David Hildenbrand
2021-03-26  8:55                                                       ` Oscar Salvador
2021-03-26  9:11                                                         ` Michal Hocko
2021-03-25 18:08                                                 ` David Hildenbrand
2021-03-25 12:26                       ` Michal Hocko
2021-03-25 14:02                         ` Oscar Salvador
2021-03-25 14:40                           ` Michal Hocko
2021-03-19  9:26 ` [PATCH v5 2/5] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2021-03-23 10:40   ` Michal Hocko
2021-03-19  9:26 ` [PATCH v5 3/5] mm,memory_hotplug: Add kernel boot option to enable memmap_on_memory Oscar Salvador
2021-03-23 10:47   ` Michal Hocko
2021-03-24  8:45     ` Oscar Salvador
2021-03-24  9:02       ` Michal Hocko
2021-03-19  9:26 ` [PATCH v5 4/5] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
2021-03-19  9:26 ` [PATCH v5 5/5] arm64/Kconfig: " Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5be95091-b4ac-8e05-4694-ac5c65f790a4@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).