linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Michal Hocko <mhocko@suse.com>, Muchun Song <songmuchun@bytedance.com>
Cc: "Jonathan Corbet" <corbet@lwn.net>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	"Peter Zijlstra" <peterz@infradead.org>,
	viro@zeniv.linux.org.uk,
	"Andrew Morton" <akpm@linux-foundation.org>,
	paulmck@kernel.org, mchehab+huawei@kernel.org,
	pawan.kumar.gupta@linux.intel.com,
	"Randy Dunlap" <rdunlap@infradead.org>,
	oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de,
	"Mina Almasry" <almasrymina@google.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	"David Hildenbrand" <david@redhat.com>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Xiongchun duan" <duanxiongchun@bytedance.com>,
	linux-doc@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	"Linux Memory Management List" <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Tue, 16 Feb 2021 11:44:34 -0800	[thread overview]
Message-ID: <29cdbd0f-dbc2-1a72-15b7-55f81000fa9e@oracle.com> (raw)
In-Reply-To: <YCqhDZ0EAgvCz+wX@dhcp22.suse.cz>

On 2/15/21 8:27 AM, Michal Hocko wrote:
> On Mon 15-02-21 23:36:49, Muchun Song wrote:
> [...]
>>> There shouldn't be any real reason why the memory allocation for
>>> vmemmaps, or handling vmemmap in general, has to be done from within the
>>> hugetlb lock and therefore requiring a non-sleeping semantic. All that
>>> can be deferred to a more relaxed context. If you want to make a
>>
>> Yeah, you are right. We can put the freeing hugetlb routine to a
>> workqueue. Just like I do in the previous version (before v13) patch.
>> I will pick up these patches.
> 
> I haven't seen your v13 and I will unlikely have time to revisit that
> version. I just wanted to point out that the actual allocation doesn't
> have to happen from under the spinlock. There are multiple ways to go
> around that. Dropping the lock would be one of them. Preallocation
> before the spin lock is taken is another. WQ is certainly an option but
> I would take it as the last resort when other paths are not feasible.

Sorry for jumping in late, Monday was a US holiday ...

IIRC, the point of moving the vmemmap allocations under the hugetlb_lock
was just for simplicity.  The idea was to modify the allocations to be
non-blocking so that allocating pages and restoring vmemmap could be done
as part of normal huge page freeing where we are holding the lock.  Perhaps
that is too simplistic of an approach.

IMO, using the workque approach as done in previous patches introduces
too much complexity.

Michal did bring up the question "Do we really want to do all the vmemmap
allocation (even non-blocking) and manipulation under the hugetlb lock?
I'm thinking the answer may be no.  For 1G pages, this will require 4094
calls to alloc_pages.  Even with non-blocking calls this seems like a long
time.

If we are not going to do the allocations under the lock, then we will need
to either preallocate or take the workqueue approach.  One complication with
preallocation is that we do not for sure we will be freeing the huge page
to buddy until we take the hugetlb_lock.  This is because the decision to
free or not is based on counters protected by the lock.  We could of course
check counters without the lock to guess if we will be freeing the page,
and then check again after acquiring the lock.  This may not be too bad in
the case of freeing a single page, but would become more complex when doing
bulk freeing.  After a little thought, the workqueue approach may even end
up simpler.  However, I would suggest a very simple workqueue implementation
with non-blocking allocations.  If we can not quickly get vmemmap pages,
put the page back on the hugetlb free list and treat as a surplus page.
-- 
Mike Kravetz

  parent reply	other threads:[~2021-02-16 19:46 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08  8:50 [PATCH v15 0/8] Free some vmemmap pages of HugeTLB page Muchun Song
2021-02-08  8:50 ` [PATCH v15 1/8] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-02-08  8:50 ` [PATCH v15 2/8] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-02-08  8:50 ` [PATCH v15 3/8] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-02-08  8:50 ` [PATCH v15 4/8] mm: hugetlb: alloc " Muchun Song
2021-02-11 18:05   ` Mike Kravetz
2021-02-12 14:15     ` David Hildenbrand
2021-02-12 15:32   ` Michal Hocko
2021-02-15 10:05     ` [External] " Muchun Song
2021-02-15 10:33       ` Michal Hocko
2021-02-15 11:51         ` Muchun Song
2021-02-15 12:00           ` Muchun Song
2021-02-15 12:18             ` Michal Hocko
2021-02-15 12:44               ` Muchun Song
2021-02-15 13:19                 ` Michal Hocko
2021-02-15 15:36                   ` Muchun Song
2021-02-15 16:27                     ` Michal Hocko
2021-02-15 17:48                       ` Muchun Song
2021-02-15 18:19                         ` Muchun Song
2021-02-15 19:39                           ` Michal Hocko
2021-02-16  4:34                             ` Muchun Song
2021-02-16  8:15                               ` Michal Hocko
2021-02-16  8:20                                 ` David Hildenbrand
2021-02-16  9:03                                 ` Muchun Song
2021-02-15 19:02                         ` Michal Hocko
2021-02-16  8:13                           ` David Hildenbrand
2021-02-16  8:21                             ` Michal Hocko
2021-02-16 19:44                       ` Mike Kravetz [this message]
2021-02-17  8:13                         ` Michal Hocko
2021-02-18  1:00                           ` Mike Kravetz
2021-02-18  3:20                             ` Muchun Song
2021-02-18  8:21                               ` Michal Hocko
2021-02-15 12:24           ` Michal Hocko
2021-02-08  8:50 ` [PATCH v15 5/8] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-02-08  8:50 ` [PATCH v15 6/8] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2021-02-08  8:50 ` [PATCH v15 7/8] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-02-08  8:50 ` [PATCH v15 8/8] mm: hugetlb: optimize the code with the help of the compiler Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29cdbd0f-dbc2-1a72-15b7-55f81000fa9e@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).