linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	Muchun Song <songmuchun@bytedance.com>,
	corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, viro@zeniv.linux.org.uk,
	akpm@linux-foundation.org, mchehab+huawei@kernel.org,
	pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org,
	oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de,
	almasrymina@google.com, rientjes@google.com, willy@infradead.org,
	osalvador@suse.de, song.bao.hua@hisilicon.com, david@redhat.com,
	naoya.horiguchi@nec.com, joao.m.martins@oracle.com,
	duanxiongchun@bytedance.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, Chen Huang <chenhuang5@huawei.com>,
	Bodeddula Balasubramaniam <bodeddub@amazon.com>
Subject: Re: [PATCH v18 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Fri, 12 Mar 2021 09:15:21 +0100	[thread overview]
Message-ID: <YEsjGbKtyrpfas4C@dhcp22.suse.cz> (raw)
In-Reply-To: <4d4851fd-f0fd-9bfe-d271-b53891fdab6f@oracle.com>

On Thu 11-03-21 14:53:08, Mike Kravetz wrote:
> On 3/11/21 9:59 AM, Mike Kravetz wrote:
> > On 3/11/21 4:17 AM, Michal Hocko wrote:
> >>> Yeah per cpu preempt counting shouldn't be noticeable but I have to
> >>> confess I haven't benchmarked it.
> >>
> >> But all this seems moot now http://lkml.kernel.org/r/YEoA08n60+jzsnAl@hirez.programming.kicks-ass.net
> >>
> > 
> > The proper fix for free_huge_page independent of this series would
> > involve:
> > 
> > - Make hugetlb_lock and subpool lock irq safe
> > - Hand off freeing to a workque if the freeing could sleep
> > 
> > Today, the only time we can sleep in free_huge_page is for gigantic
> > pages allocated via cma.  I 'think' the concern about undesirable
> > user visible side effects in this case is minimal as freeing/allocating
> > 1G pages is not something that is going to happen at a high frequency.
> > My thinking could be wrong?
> > 
> > Of more concern, is the introduction of this series.  If this feature
> > is enabled, then ALL free_huge_page requests must be sent to a workqueue.
> > Any ideas on how to address this?
> > 
> 
> Thinking about this more ...
> 
> A call to free_huge_page has two distinct outcomes
> 1) Page is freed back to the original allocator: buddy or cma
> 2) Page is put on hugetlb free list
> 
> We can only possibly sleep in the first case 1.  In addition, freeing a
> page back to the original allocator involves these steps:
> 1) Removing page from hugetlb lists
> 2) Updating hugetlb counts: nr_hugepages, surplus
> 3) Updating page fields
> 4) Allocate vmemmap pages if needed as in this series
> 5) Calling free routine of original allocator
> 
> If hugetlb_lock is irq safe, we can perform the first 3 steps under that
> lock without issue.  We would then use a workqueue to perform the last
> two steps.  Since we are updating hugetlb user visible data under the
> lock, there should be no delays.  Of course, giving those pages back to
> the original allocator could still be delayed, and a user may notice
> that.  Not sure if that would be acceptable?

Well, having many in-flight huge pages can certainly be visible. Say you
are freeing hundreds of huge pages and your echo n > nr_hugepages will
return just for you to find out that the memory hasn't been freed and
therefore cannot be reused for another use - recently there was somebody
mentioning their usecase to free up huge pages to prevent OOM for
example. I do expect more people doing something like that.

Now, nr_hugepages can be handled by blocking on the same WQ until all
pre-existing items are processed. Maybe we will need to have a more
generic API to achieve the same for in kernel users but let's wait for
those requests.

> I think Muchun had a
> similar setup just for vmemmmap allocation in an early version of this
> series.
> 
> This would also require changes to where accounting is done in
> dissolve_free_huge_page and update_and_free_page as mentioned elsewhere.

Normalizing dissolve_free_huge_page is definitely a good idea. It is
really tricky how it sticks out and does half of the job of
update_and_free_page.

That being said, if it is possible to have a fully consistent h state
before handing over to WQ for sleeping operation then we should be all
fine. I am slightly worried about potential tricky situations where the
sleeping operation fails because that would require that page to be
added back to the pool again. As said above we would need some sort of
sync with in-flight operations before returning to the userspace.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2021-03-12  8:16 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-08 10:27 [PATCH v18 0/9] Free some vmemmap pages of HugeTLB page Muchun Song
2021-03-08 10:27 ` [PATCH v18 1/9] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-03-10 14:14   ` Michal Hocko
2021-03-11  2:58     ` [External] " Muchun Song
2021-03-11  8:45       ` Muchun Song
2021-03-11  8:53         ` Michal Hocko
2021-03-11  9:05           ` Muchun Song
2021-03-08 10:28 ` [PATCH v18 2/9] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-03-08 10:28 ` [PATCH v18 3/9] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-03-10 14:32   ` Michal Hocko
2021-03-11  3:35     ` [External] " Muchun Song
2021-03-08 10:28 ` [PATCH v18 4/9] mm: hugetlb: alloc " Muchun Song
2021-03-10 14:21   ` Oscar Salvador
2021-03-11  4:13     ` [External] " Muchun Song
2021-03-10 15:19   ` Michal Hocko
2021-03-10 18:56     ` Mike Kravetz
2021-03-10 21:11       ` Michal Hocko
2021-03-10 21:49         ` Paul E. McKenney
2021-03-10 22:10           ` Mike Kravetz
2021-03-10 23:28             ` Paul E. McKenney
2021-03-11  8:40               ` Michal Hocko
2021-03-11 12:17                 ` Michal Hocko
2021-03-11 17:59                   ` Mike Kravetz
2021-03-11 22:53                     ` Mike Kravetz
2021-03-12  8:15                       ` Michal Hocko [this message]
2021-03-12 17:50                         ` Mike Kravetz
2021-03-11  4:26     ` [External] " Muchun Song
2021-03-11  8:46       ` Michal Hocko
2021-03-11  8:49         ` Muchun Song
2021-03-08 10:28 ` [PATCH v18 5/9] mm: hugetlb: set the PageHWPoison to the raw error page Muchun Song
2021-03-10 15:27   ` Michal Hocko
2021-03-11  6:34     ` [External] " Muchun Song
2021-03-11  8:50       ` Michal Hocko
2021-03-11  9:13         ` Muchun Song
2021-03-08 10:28 ` [PATCH v18 6/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-03-10 15:37   ` Michal Hocko
2021-03-10 17:15     ` Randy Dunlap
2021-03-11  6:36       ` [External] " Muchun Song
2021-03-11  6:36     ` Muchun Song
2021-03-08 10:28 ` [PATCH v18 7/9] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2021-03-08 10:28 ` [PATCH v18 8/9] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-03-10 15:39   ` Michal Hocko
2021-03-08 10:28 ` [PATCH v18 9/9] mm: hugetlb: optimize the code with the help of the compiler Muchun Song
2021-03-10 15:41   ` Michal Hocko
2021-03-11  7:33     ` [External] " Muchun Song
2021-03-11  8:55       ` Michal Hocko
2021-03-11  9:08         ` Muchun Song
2021-03-11  9:39           ` Michal Hocko
2021-03-11 10:00             ` Muchun Song
2021-03-11 12:16               ` Michal Hocko
2021-03-11 13:00                 ` Muchun Song
2021-03-11 13:45                 ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YEsjGbKtyrpfas4C@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bodeddub@amazon.com \
    --cc=bp@alien8.de \
    --cc=chenhuang5@huawei.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).