All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 1/5] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Mon, 15 Mar 2021 11:22:29 +0100	[thread overview]
Message-ID: <20210315102224.GA24699@linux> (raw)
In-Reply-To: <f600451e-48aa-184f-ae71-94e0abe9d6b1@redhat.com>

On Thu, Mar 11, 2021 at 08:06:53PM +0100, David Hildenbrand wrote:
> This looks essentially good to me, except some parts in
> mhp_supports_memmap_on_memory()
> 
> > +bool mhp_supports_memmap_on_memory(unsigned long size)
> > +{
> > +	unsigned long pageblock_size = PFN_PHYS(pageblock_nr_pages);
> > +	unsigned long remaining_mem = size - PMD_SIZE;

Hi David, thanks for the review!

> This looks weird. I think what you want to test is that
> 
> 
> a) "nr_vmemmap_pages * sizeof(struct page)" spans complete PMDs (IOW, we
> won't map too much via the altmap when populating a PMD in the vmemmap)
> 
> b) "remaining = size - nr_vmemmap_pages * sizeof(struct page)" spans
> complete pageblock.

We do not know the nr_vmemmap_pages at this point in time, although it is
easy to pre-compute.

For every section we populate, we use PMD_SIZE. So, PMD_SIZE/PAGE_SIZE lays
the nr_vmemmap_pages that are used for populating a single section.

But let me explain the reasoning I use in the current code:

I will enumarate the assumptions that must hold true in order to support the
feature together with their check:

- We span a single memory block

  size == memory_block_size_bytes()

- The vmemmap pages span a complete PMD and no more than a PMD.

  !(PMD_SIZE % sizeof(struct page))

- The vmemmap pages and the pages exposed to the buddy have to cover full
  pageblocks

  remaining_mem = size - PMD_SIZE;
  IS_ALIGNED(remaining_mem, pageblock_size)

  Although this check only covers the range without the vmemmap pages, one could
  argue that since we use only a PMD_SIZE at a time, we know that PMD_SIZE is
  pageblock aligned, so the vmemmap range is PMD_SIZE as well.

Now, I see how this might be confusing and rather incomplete.
So I guess a better and more clear way to write it would be:

 bool mhp_supports_memmap_on_memory(unsigned long size)
 {
         unsigned long nr_vmemmap_pages = PMD_SIZE / PAGE_SIZE;
         unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
         unsigned long remaining_size = size - vmemmap_size;

         return memmap_on_memory &&
                IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
                size == memory_block_size_bytes() &&
                !(PMD_SIZE % vmemmap_size) &&
                IS_ALIGNED(vmemmap_size, pageblock_size) &&
                remaining_size &&
                IS_ALIGNED(remaining_size, pageblock_size);
  }
                
Note that above check is only for a single section, but if assumptions hold true
for a single section, it will for many as well.
We could be orthodox and do:

 bool mhp_supports_memmap_on_memory(unsigned long size)
 {
         unsigned long nr_sections = (1ULL << SECTION_SHIFT) / memory_block_size_bytes;
         unsigned long nr_vmemmap_pages = (PMD_SIZE / PAGE_SIZE) * nr_sections;
         unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
         unsigned long remaining_size = size - vmemmap_size;

         return memmap_on_memory &&
                IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
                size == memory_block_size_bytes() &&
                !(PMD_SIZE % vmemmap_size) &&
                IS_ALIGNED(vmemmap_size, pageblock_size) &&
                remaining_size &&
                IS_ALIGNED(remaining_size, pageblock_size);
  }
        
to check for all sections, but I do not think it is necessary.

What do you think?
	
> I suggest a restructuring, compressing the information like:
> 
> "
> Besides having arch support and the feature enabled at runtime, we need a
> few more assumptions to hold true:
> 
> a) We span a single memory block: memory onlining/offlining happens in
> memory block granularity. We don't want the vmemmap of online memory blocks
> to reside on offline memory blocks. In the future, we might want to support
> variable-sized memory blocks to make the feature more versatile.
> 
> b) The vmemmap pages span complete PMDs: We don't want vmemmap code to
> populate memory from the altmap for unrelated parts (i.e., other memory
> blocks).
> 
> c) The vmemmap pages (and thereby the pages that will be exposed to the
> buddy) have to cover full pageblocks: memory onlining/offlining code
> requires applicable ranges to be page-aligned, for example, to set the
> migratetypes properly.
> "

I am fine with the above, I already added it, thanks.

> Do we have to special case / protect against the vmemmap optimization for
> hugetlb pages? Or is that already blocked somehow and I missed it?

Yes, hugetlb-vmemmap feature disables vmemmap on PMD population [1]

[1] https://patchwork.kernel.org/project/linux-mm/patch/20210308102807.59745-7-songmuchun@bytedance.com/


-- 
Oscar Salvador
SUSE L3

  reply	other threads:[~2021-03-15 10:23 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-09 17:55 [PATCH v4 0/5] Allocate memmap from hotadded memory (per device) Oscar Salvador
2021-03-09 17:55 ` [PATCH v4 1/5] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2021-03-11 19:06   ` David Hildenbrand
2021-03-15 10:22     ` Oscar Salvador [this message]
2021-03-16 16:46       ` David Hildenbrand
2021-03-16 17:45         ` David Hildenbrand
2021-03-17 14:08           ` Oscar Salvador
2021-03-17 14:35             ` David Hildenbrand
2021-03-18  8:27               ` Oscar Salvador
2021-03-18 10:38                 ` Oscar Salvador
2021-03-18 11:24                   ` David Hildenbrand
2021-03-18 12:03                     ` Oscar Salvador
2021-03-18 12:05                       ` David Hildenbrand
2021-03-09 17:55 ` [PATCH v4 2/5] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2021-03-09 17:55 ` [PATCH v4 3/5] mm,memory_hotplug: Add kernel boot option to enable memmap_on_memory Oscar Salvador
2021-03-09 17:55 ` [PATCH v4 4/5] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
2021-03-09 17:55 ` [PATCH v4 5/5] arm64/Kconfig: " Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210315102224.GA24699@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.