All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: zhenguo yao <yaozhenguo1@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	corbet@lwn.net, yaozhenguo@jd.com,
	Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH v4] hugetlbfs: Extend the definition of hugepages parameter to support node allocation
Date: Fri, 17 Sep 2021 16:59:01 +0300	[thread overview]
Message-ID: <YUSfJRnvYmGbIl1f@kernel.org> (raw)
In-Reply-To: <98a8ea20-5642-d332-d7b4-18e075a594fb@oracle.com>

Hi Mike,

On Wed, Sep 15, 2021 at 03:05:41PM -0700, Mike Kravetz wrote:
> Now, really CC'ing Mike, and sorry for misspelling your name
> 
> On 9/15/21 3:03 PM, Mike Kravetz wrote:
> > On 9/15/21 6:11 AM, zhenguo yao wrote:
> >> Andrew Morton <akpm@linux-foundation.org> 于2021年9月15日周三 上午11:50写道:
> >>>
> >>> On Thu,  9 Sep 2021 22:16:55 +0800 yaozhenguo <yaozhenguo1@gmail.com> wrote:
> >>>
> >>>> We can specify the number of hugepages to allocate at boot. But the
> >>>> hugepages is balanced in all nodes at present. In some scenarios,
> >>>> we only need hugepages in one node. For example: DPDK needs hugepages
> >>>> which are in the same node as NIC. if DPDK needs four hugepages of 1G
> >>>> size in node1 and system has 16 numa nodes. We must reserve 64 hugepages
> >>>> in kernel cmdline. But, only four hugepages are used. The others should
> >>>> be free after boot. If the system memory is low(for example: 64G), it will
> >>>> be an impossible task. So, Extending hugepages parameter to support
> >>>> specifying hugepages at a specific node.
> >>>> For example add following parameter:
> >>>>
> >>>> hugepagesz=1G hugepages=0:1,1:3
> >>>>
> >>>> It will allocate 1 hugepage in node0 and 3 hugepages in node1.
> >>>>
> >>>> ...
> >>>>
> >>>> @@ -2842,10 +2843,75 @@ static void __init gather_bootmem_prealloc(void)
> >>>>       }
> >>>>  }
> >>>>
> >>>> +static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid)
> >>>> +{
> >>>> +     unsigned long i;
> >>>> +     char buf[32];
> >>>> +
> >>>> +     for (i = 0; i < h->max_huge_pages_node[nid]; ++i) {
> >>>> +             if (hstate_is_gigantic(h)) {
> >>>> +                     struct huge_bootmem_page *m;
> >>>> +                     void *addr;
> >>>> +
> >>>> +                     addr = memblock_alloc_try_nid_raw(
> >>>> +                                     huge_page_size(h), huge_page_size(h),
> >>>> +                                     0, MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >>>> +                     if (!addr)
> >>>> +                             break;
> >>>> +                     m = addr;
> >>>> +                     BUG_ON(!IS_ALIGNED(virt_to_phys(m), huge_page_size(h)));
> >>>
> >>> We try very hard to avoid adding BUG calls.  Is there any way in which
> >>> this code can emit a WARNing then permit the kernel to keep operating?
> >>>
> >> Maybe we can rewrite it as below:
> >>                         if (WARN(!IS_ALIGNED(virt_to_phys(m),
> >> huge_page_size(h)),
> >>                                 "HugeTLB: page addr:%p is not aligned\n", m))
> >>                                 break;
> >> @Mike,  Do you think it's OK?
> > 
> > Sorry, I have not yet reviewed the latest version of this patch.
> > Quick thought on this question.
> > 
> > The required alignment passed to memblock_alloc_try_nid_raw() is
> > huge_page_size(h).  Therefore, we know the virtual address m is
> > huge_page_size(h) aligned.  The BUG is just checking to make sure
> > the physical address associated with the virtual address is aligned
> > the same.  I really do not see how this could not be the case.
> > In fact, the memblock allocator finds a physical address with the
> > required alignment and then returns phys_to_virt(alloc).
> > Someone please correct me if I am wrong.  Otherwise, we can drop
> > the BUG.

I agree with your analysis and I also think the BUG() can be dropped
entirely as well as the BUG() in __alloc_bootmem_huge_page().

> > Adding Mike Rapport on Cc:
> > 
> > This allocation code and the associated BUG was copied from
> > __alloc_bootmem_huge_page().  The BUG was added 12 years ago before
> > the memblock allocator existed and we were using the bootmem allocator.
> > If there is no need for a BUG in hugetlb_hstate_alloc_pages_onenode,
> > there is no need for one in __alloc_bootmem_huge_page.

Hmm, even bootmem had alignment guaranties so it seems to me that the BUG()
was over-protective even then.

-- 
Sincerely yours,
Mike.

  reply	other threads:[~2021-09-17 13:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-09 14:16 [PATCH v4] hugetlbfs: Extend the definition of hugepages parameter to support node allocation yaozhenguo
2021-09-15  3:50 ` Andrew Morton
2021-09-15 13:11   ` zhenguo yao
2021-09-15 13:11     ` zhenguo yao
2021-09-15 22:03     ` Mike Kravetz
2021-09-15 22:05       ` Mike Kravetz
2021-09-17 13:59         ` Mike Rapoport [this message]
2021-09-15 22:38       ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUSfJRnvYmGbIl1f@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=willy@infradead.org \
    --cc=yaozhenguo1@gmail.com \
    --cc=yaozhenguo@jd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.