linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: William Kucharski <william.kucharski@oracle.com>
To: Hillf Danton <hdanton@sina.com>
Cc: David Hildenbrand <david@redhat.com>,
	Zach O'Keefe <zokeefe@google.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Peter Xu <peterx@redhat.com>, Rik van Riel <riel@surriel.com>,
	Mike Rapoport <rppt@kernel.org>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: THP backed thread stacks
Date: Sun, 12 Mar 2023 04:39:05 +0000	[thread overview]
Message-ID: <3A5D4B5D-FF0A-4AB5-8E86-24893FE82A9A@oracle.com> (raw)
In-Reply-To: <20230312005549.2609-1-hdanton@sina.com>



> On Mar 11, 2023, at 5:55 PM, Hillf Danton <hdanton@sina.com> wrote:
> 
> On 11 Mar 2023 12:24:58 +0000 William Kucharski <william.kucharski@oracle.com>
>>> On Mar 10, 2023, at 04:25, David Hildenbrand <david@redhat.com> wrote:
>>> On 10.03.23 02:40, William Kucharski wrote:
>>>>> On Mar 9, 2023, at 17:05, Zach O'Keefe <zokeefe@google.com> wrote:
>>>>> =20
>>>>>> I think the hugepage alignment in their environment was somewhat luck.
>>>>>> One suggestion made was to change stack size to avoid alignment and
>>>>>> hugepage usage.  That 'works' but seems kind of hackish.
>>>>> =20
>>>>> That was my first thought, if the alignment was purely due to luck,
>>>>> and not somebody manually specifying it. Agreed it's kind of hackish
>>>>> if anyone can get bit by this by sheer luck.
>>>> I don't agree it's "hackish" at all, but I go more into that below.
>>>>> =20
>>>>>> Also, David H pointed out the somewhat recent commit to align sufficie=
>> ntly
>>>>>> large mappings to THP boundaries.  This is going to make all stacks hu=
>> ge
>>>>>> page aligned.
>>>>> =20
>>>>> I think that change was reverted by Linus in commit 0ba09b173387
>>>>> ("Revert "mm: align larger anonymous mappings on THP boundaries""),
>>>>> until it's perf regressions were better understood -- and I haven't
>>>>> seen a revamp of it.
>>>> It's too bad it was reverted, though I understand the concerns regarding=
>> it.
>>>> From my point of view, if an address is properly aligned and a caller is
>>>> asking for 2M+ to be mapped, it's going to be advantageous from a purely
>>>> system-focused point of view to do that mapping with a THP.=20
>>> =20
>>> Just noting that, if user space requests multiple smaller mappings, and t=
>> he kernel decides to all place them in the same PMD, all VMAs might get mer=
>> ged and you end up with a properly aligned VMA where khugepaged would happi=
>> ly place a THP.
>>> =20
>>> That case is, of course, different to the "user space asks for 2M+" mappi=
>> ng case, but from khugepaged perspective they might look alike -- and it mi=
>> ght be unclear if a THP is valuable or not (IOW maybe that THP could be bet=
>> ter used somewhere else).
>> 
>> That's a really, really good point.
>> 
>> My general philosophy on the subject (if the address is aligned and the cal=
>> ler is asking for a THP-sized allocation, why not map it with a THP if you =
>> can) kind of falls apart when it's the system noticing it can coalesce a bu=
>> nch of smaller allocations into one THP via khugepaged.
>> 
>> Arguably it's the difference between the caller knowing it's asking for som=
>> ething THP-sized on its behalf and the system deciding to remap a bunch of =
>> disparate mappings using a THP because _it_ can.
>> 
>> If we were to say allow a caller's request for a THP-sized allocation/mappi=
>> ng take priority over those from khugepaged, it would not only be a major v=
>> ector for abuse, it would also lead to completely indeterminate behavior ("=
>> When I start my browser after a reboot I get a bunch of THPs, but after the=
>> system's been up for a few weeks, I don't, how come?")
> 
> Given transparent_hugepage_flags, how would it be abused?  And indetermined?

I was speaking in terms of heuristics, if we allowed callers making THP-mappable requests to have priority over khugepaged requests by default, it would be easy for callers to abuse that to request most THP-mappable memory, leaving little with which khugepaged could coalesce smaller pages.

This is much the way hugetlbfs is sometimes used now where if callers can't get the allocations they require, users reboot the machine and make sure their applications requiring such allocations run first.

My apologies if I am missing an existing mechanism preventing this, it's been a bit since I walked through that code.

   -- Bill



  reply	other threads:[~2023-03-12  4:39 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 23:57 THP backed thread stacks Mike Kravetz
2023-03-07  0:15 ` Peter Xu
2023-03-07  0:40   ` Mike Kravetz
2023-03-08 19:02     ` Mike Kravetz
2023-03-09 22:38       ` Zach O'Keefe
2023-03-09 23:33         ` Mike Kravetz
2023-03-10  0:05           ` Zach O'Keefe
2023-03-10  1:40             ` William Kucharski
2023-03-10 11:25               ` David Hildenbrand
2023-03-11 12:24                 ` William Kucharski
2023-03-12  0:55                   ` Hillf Danton
2023-03-12  4:39                     ` William Kucharski [this message]
2023-03-10 22:02             ` Yang Shi
2023-03-07 10:10 ` David Hildenbrand
2023-03-07 19:02   ` Mike Kravetz
2023-03-07 13:36 ` Mike Rapoport
2023-03-17 17:52 ` Matthew Wilcox
2023-03-17 18:46   ` Mike Kravetz
2023-03-20 11:12     ` David Hildenbrand
2023-03-20 17:46       ` William Kucharski
2023-03-20 17:52         ` David Hildenbrand
2023-03-20 18:06         ` Mike Kravetz
2023-03-18 12:58   ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A5D4B5D-FF0A-4AB5-8E86-24893FE82A9A@oracle.com \
    --to=william.kucharski@oracle.com \
    --cc=david@redhat.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).