All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Yin Fengwei <fengwei.yin@intel.com>, Yu Zhao <yuzhao@google.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Yang Shi <shy828301@gmail.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance
Date: Fri, 7 Jul 2023 21:06:48 +0200	[thread overview]
Message-ID: <b1e7c52c-cc3a-92c8-e466-3ba5ec2ba2fb@redhat.com> (raw)
In-Reply-To: <c5eb896b-dbb4-396d-62f6-5d5dde2d7df6@arm.com>

>>> I still feel that it would be better for the thp and large anon folio controls
>>> to be independent though - what's the argument for tying them together?
>>
>> Thinking about desired 2 MiB flexible THP on aarch64 (64k kernel) vs, 2 MiB PMD
>> THP on aarch64 (4k kernel), how are they any different? Just the way they are
>> mapped ...
> 
> The last patch in the series shows my current approach to that:
> 
> int arch_wants_pte_order(struct vm_area_struct *vma)
> {
> 	if (hugepage_vma_check(vma, vma->vm_flags, false, true, true))
> 		return CONFIG_ARM64_PTE_ORDER_THP; <<< always the contpte size
> 	else
> 		return CONFIG_ARM64_PTE_ORDER_NOTHP; <<< limited to 64K
> }
> 
> But Yu has raised concerns that this type of policy needs to be in the core mm.
> So we could have the arch blindly return the preferred order from HW perspective
> (which would be contpte size for arm64). Then for !hugepage_vma_check(), mm
> could take the min of that value and some determined "acceptable" limit (which
> in my mind is 64K ;-).

Yeah, it's really tricky. Because why should arm64 with 64k base pages 
*not* return 2MiB (which is one possible cont-pte size IIRC) ?

I share the idea that 64k might *currently* on *some platforms* be a 
reasonable choice. But that's where the "fun" begins.

> 
>>
>> It's easy to say "64k vs. 2 MiB" is a difference and we want separate controls,
>> but how is "2MiB vs. 2 MiB" different?
>>
>> Having that said, I think we have to make up our mind how much control we want
>> to give user space. Again, the "2MiB vs. 2 MiB" case nicely shows that it's not
>> trivial: memory waste is a real issue on some systems where we limit THP to
>> madvise().
>>
>>
>> Just throwing it out for discussing:
>>
>> What about keeping the "all / madvise / never" semantics (and MADV_NOHUGEPAGE
>> ...) but having an additional config knob that specifies in which cases we
>> *still* allow flexible THP even though the system was configured for "madvise".
>>
>> I can't come up with a good name for that, but something like
>> "max_auto_size=64k" could be something reasonable to set. We could have an
>> arch+hw specific default.
> 
> Ahha, yes, that's essentially what I have above. I personally also like the idea
> of the limit being an absolute value rather than an order. Although I know Yu
> feels differently (see [1]).

Exposed to user space I think it should be a human-readable value. 
Inside the kernel, I don't particularly care.

(Having databases/VMs on arch64 with 64k in mind) I think it might be 
interesting to have something like the following:

thp=madvise
max_auto_size=64k/128k/256k


So in MADV_HUGEPAGE VMAs (such as under QEMU), we'd happily take any 
flexible THP, especially ones < PMD THP (512 MiB) as well. 2 MiB or 4 
MiB THP? sure, give them to my VM. You're barely going to find 512 MiB 
THP either way in practice ....

But for the remainder of my system, just do something reasonable and 
don't go crazy on the memory waste.


I'll try reading all the previous discussions next week.

-- 
Cheers,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Yin Fengwei <fengwei.yin@intel.com>, Yu Zhao <yuzhao@google.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Yang Shi <shy828301@gmail.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance
Date: Fri, 7 Jul 2023 21:06:48 +0200	[thread overview]
Message-ID: <b1e7c52c-cc3a-92c8-e466-3ba5ec2ba2fb@redhat.com> (raw)
In-Reply-To: <c5eb896b-dbb4-396d-62f6-5d5dde2d7df6@arm.com>

>>> I still feel that it would be better for the thp and large anon folio controls
>>> to be independent though - what's the argument for tying them together?
>>
>> Thinking about desired 2 MiB flexible THP on aarch64 (64k kernel) vs, 2 MiB PMD
>> THP on aarch64 (4k kernel), how are they any different? Just the way they are
>> mapped ...
> 
> The last patch in the series shows my current approach to that:
> 
> int arch_wants_pte_order(struct vm_area_struct *vma)
> {
> 	if (hugepage_vma_check(vma, vma->vm_flags, false, true, true))
> 		return CONFIG_ARM64_PTE_ORDER_THP; <<< always the contpte size
> 	else
> 		return CONFIG_ARM64_PTE_ORDER_NOTHP; <<< limited to 64K
> }
> 
> But Yu has raised concerns that this type of policy needs to be in the core mm.
> So we could have the arch blindly return the preferred order from HW perspective
> (which would be contpte size for arm64). Then for !hugepage_vma_check(), mm
> could take the min of that value and some determined "acceptable" limit (which
> in my mind is 64K ;-).

Yeah, it's really tricky. Because why should arm64 with 64k base pages 
*not* return 2MiB (which is one possible cont-pte size IIRC) ?

I share the idea that 64k might *currently* on *some platforms* be a 
reasonable choice. But that's where the "fun" begins.

> 
>>
>> It's easy to say "64k vs. 2 MiB" is a difference and we want separate controls,
>> but how is "2MiB vs. 2 MiB" different?
>>
>> Having that said, I think we have to make up our mind how much control we want
>> to give user space. Again, the "2MiB vs. 2 MiB" case nicely shows that it's not
>> trivial: memory waste is a real issue on some systems where we limit THP to
>> madvise().
>>
>>
>> Just throwing it out for discussing:
>>
>> What about keeping the "all / madvise / never" semantics (and MADV_NOHUGEPAGE
>> ...) but having an additional config knob that specifies in which cases we
>> *still* allow flexible THP even though the system was configured for "madvise".
>>
>> I can't come up with a good name for that, but something like
>> "max_auto_size=64k" could be something reasonable to set. We could have an
>> arch+hw specific default.
> 
> Ahha, yes, that's essentially what I have above. I personally also like the idea
> of the limit being an absolute value rather than an order. Although I know Yu
> feels differently (see [1]).

Exposed to user space I think it should be a human-readable value. 
Inside the kernel, I don't particularly care.

(Having databases/VMs on arch64 with 64k in mind) I think it might be 
interesting to have something like the following:

thp=madvise
max_auto_size=64k/128k/256k


So in MADV_HUGEPAGE VMAs (such as under QEMU), we'd happily take any 
flexible THP, especially ones < PMD THP (512 MiB) as well. 2 MiB or 4 
MiB THP? sure, give them to my VM. You're barely going to find 512 MiB 
THP either way in practice ....

But for the remainder of my system, just do something reasonable and 
don't go crazy on the memory waste.


I'll try reading all the previous discussions next week.

-- 
Cheers,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-07-07 19:08 UTC|newest]

Thread overview: 167+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-03 13:53 [PATCH v2 0/5] variable-order, large folios for anonymous memory Ryan Roberts
2023-07-03 13:53 ` Ryan Roberts
2023-07-03 13:53 ` [PATCH v2 1/5] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-03 19:05   ` Yu Zhao
2023-07-03 19:05     ` Yu Zhao
2023-07-04  2:13     ` Yin, Fengwei
2023-07-04  2:13       ` Yin, Fengwei
2023-07-04 11:19       ` Ryan Roberts
2023-07-04 11:19         ` Ryan Roberts
2023-07-04  2:14   ` Yin, Fengwei
2023-07-04  2:14     ` Yin, Fengwei
2023-07-03 13:53 ` [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large anon folios Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-07  8:21   ` Huang, Ying
2023-07-07  8:21     ` Huang, Ying
2023-07-07  9:39     ` Ryan Roberts
2023-07-07  9:42     ` Ryan Roberts
2023-07-07  9:42       ` Ryan Roberts
2023-07-10  5:37       ` Huang, Ying
2023-07-10  5:37         ` Huang, Ying
2023-07-10  8:29         ` Ryan Roberts
2023-07-10  8:29           ` Ryan Roberts
2023-07-10  9:01           ` Huang, Ying
2023-07-10  9:01             ` Huang, Ying
2023-07-10  9:39             ` Ryan Roberts
2023-07-10  9:39               ` Ryan Roberts
2023-07-11  1:56               ` Huang, Ying
2023-07-11  1:56                 ` Huang, Ying
2023-07-03 13:53 ` [PATCH v2 3/5] mm: Default implementation of arch_wants_pte_order() Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-03 19:50   ` Yu Zhao
2023-07-03 19:50     ` Yu Zhao
2023-07-04 13:20     ` Ryan Roberts
2023-07-04 13:20       ` Ryan Roberts
2023-07-05  2:07       ` Yu Zhao
2023-07-05  2:07         ` Yu Zhao
2023-07-05  9:11         ` Ryan Roberts
2023-07-05  9:11           ` Ryan Roberts
2023-07-05 17:24           ` Yu Zhao
2023-07-05 17:24             ` Yu Zhao
2023-07-05 18:01             ` Ryan Roberts
2023-07-05 18:01               ` Ryan Roberts
2023-07-06 19:33         ` Matthew Wilcox
2023-07-06 19:33           ` Matthew Wilcox
2023-07-07 10:00           ` Ryan Roberts
2023-07-07 10:00             ` Ryan Roberts
2023-07-04  2:22   ` Yin, Fengwei
2023-07-04  2:22     ` Yin, Fengwei
2023-07-04  3:02     ` Yu Zhao
2023-07-04  3:02       ` Yu Zhao
2023-07-04  3:59       ` Yu Zhao
2023-07-04  3:59         ` Yu Zhao
2023-07-04  5:22         ` Yin, Fengwei
2023-07-04  5:22           ` Yin, Fengwei
2023-07-04  5:42           ` Yu Zhao
2023-07-04  5:42             ` Yu Zhao
2023-07-04 12:36         ` Ryan Roberts
2023-07-04 12:36           ` Ryan Roberts
2023-07-04 13:23           ` Ryan Roberts
2023-07-04 13:23             ` Ryan Roberts
2023-07-05  1:40             ` Yu Zhao
2023-07-05  1:40               ` Yu Zhao
2023-07-05  1:23           ` Yu Zhao
2023-07-05  1:23             ` Yu Zhao
2023-07-05  2:18             ` Yin Fengwei
2023-07-05  2:18               ` Yin Fengwei
2023-07-03 13:53 ` [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-03 15:51   ` kernel test robot
2023-07-03 15:51     ` kernel test robot
2023-07-03 16:01   ` kernel test robot
2023-07-03 16:01     ` kernel test robot
2023-07-04  1:35   ` Yu Zhao
2023-07-04  1:35     ` Yu Zhao
2023-07-04 14:08     ` Ryan Roberts
2023-07-04 14:08       ` Ryan Roberts
2023-07-04 23:47       ` Yu Zhao
2023-07-04 23:47         ` Yu Zhao
2023-07-04  3:45   ` Yin, Fengwei
2023-07-04  3:45     ` Yin, Fengwei
2023-07-04 14:20     ` Ryan Roberts
2023-07-04 14:20       ` Ryan Roberts
2023-07-04 23:35       ` Yin Fengwei
2023-07-04 23:57       ` Matthew Wilcox
2023-07-04 23:57         ` Matthew Wilcox
2023-07-05  9:54         ` Ryan Roberts
2023-07-05  9:54           ` Ryan Roberts
2023-07-05 12:08           ` Matthew Wilcox
2023-07-05 12:08             ` Matthew Wilcox
2023-07-07  8:01   ` Huang, Ying
2023-07-07  8:01     ` Huang, Ying
2023-07-07  9:52     ` Ryan Roberts
2023-07-07  9:52       ` Ryan Roberts
2023-07-07 11:29       ` David Hildenbrand
2023-07-07 11:29         ` David Hildenbrand
2023-07-07 13:57         ` Matthew Wilcox
2023-07-07 13:57           ` Matthew Wilcox
2023-07-07 14:07           ` David Hildenbrand
2023-07-07 14:07             ` David Hildenbrand
2023-07-07 15:13             ` Ryan Roberts
2023-07-07 15:13               ` Ryan Roberts
2023-07-07 16:06               ` David Hildenbrand
2023-07-07 16:06                 ` David Hildenbrand
2023-07-07 16:22                 ` Ryan Roberts
2023-07-07 16:22                   ` Ryan Roberts
2023-07-07 19:06                   ` David Hildenbrand [this message]
2023-07-07 19:06                     ` David Hildenbrand
2023-07-10  8:41                     ` Ryan Roberts
2023-07-10  8:41                       ` Ryan Roberts
2023-07-10  3:03               ` Huang, Ying
2023-07-10  3:03                 ` Huang, Ying
2023-07-10  8:55                 ` Ryan Roberts
2023-07-10  8:55                   ` Ryan Roberts
2023-07-10  9:18                   ` Huang, Ying
2023-07-10  9:18                     ` Huang, Ying
2023-07-10  9:25                     ` Ryan Roberts
2023-07-10  9:25                       ` Ryan Roberts
2023-07-11  0:48                       ` Huang, Ying
2023-07-11  0:48                         ` Huang, Ying
2023-07-10  2:49           ` Huang, Ying
2023-07-10  2:49             ` Huang, Ying
2023-07-03 13:53 ` [PATCH v2 5/5] arm64: mm: Override arch_wants_pte_order() Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-03 20:02   ` Yu Zhao
2023-07-03 20:02     ` Yu Zhao
2023-07-04  2:18 ` [PATCH v2 0/5] variable-order, large folios for anonymous memory Yu Zhao
2023-07-04  2:18   ` Yu Zhao
2023-07-04  6:22   ` Yin, Fengwei
2023-07-04  6:22     ` Yin, Fengwei
2023-07-04  7:11     ` Yu Zhao
2023-07-04  7:11       ` Yu Zhao
2023-07-04 15:36       ` Ryan Roberts
2023-07-04 15:36         ` Ryan Roberts
2023-07-04 23:52         ` Yin Fengwei
2023-07-05  0:21           ` Yu Zhao
2023-07-05  0:21             ` Yu Zhao
2023-07-05 10:16             ` Ryan Roberts
2023-07-05 10:16               ` Ryan Roberts
2023-07-05 19:00               ` Yu Zhao
2023-07-05 19:00                 ` Yu Zhao
2023-07-05 19:38 ` David Hildenbrand
2023-07-05 19:38   ` David Hildenbrand
2023-07-06  8:02   ` Ryan Roberts
2023-07-06  8:02     ` Ryan Roberts
2023-07-07 11:40     ` David Hildenbrand
2023-07-07 11:40       ` David Hildenbrand
2023-07-07 13:12       ` Matthew Wilcox
2023-07-07 13:12         ` Matthew Wilcox
2023-07-07 13:24         ` David Hildenbrand
2023-07-07 13:24           ` David Hildenbrand
2023-07-10 10:07           ` Ryan Roberts
2023-07-10 10:07             ` Ryan Roberts
2023-07-10 16:57             ` Matthew Wilcox
2023-07-10 16:57               ` Matthew Wilcox
2023-07-10 16:53           ` Zi Yan
2023-07-10 16:53             ` Zi Yan
2023-07-19 15:49             ` Ryan Roberts
2023-07-19 15:49               ` Ryan Roberts
2023-07-19 16:05               ` Zi Yan
2023-07-19 16:05                 ` Zi Yan
2023-07-19 18:37                 ` Ryan Roberts
2023-07-19 18:37                   ` Ryan Roberts
2023-07-11 21:11         ` Luis Chamberlain
2023-07-11 21:11           ` Luis Chamberlain
2023-07-11 21:59           ` Matthew Wilcox
2023-07-11 21:59             ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1e7c52c-cc3a-92c8-e466-3ba5ec2ba2fb@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=fengwei.yin@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.