linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Gao Xiang <xiang@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Yang Shi <shy828301@gmail.com>, Michal Hocko <mhocko@suse.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v1 0/2] Swap-out small-sized THP without splitting
Date: Fri, 13 Oct 2023 17:31:20 +0100	[thread overview]
Message-ID: <a9d7103f-efcb-4ab8-91b9-6f1737789c49@arm.com> (raw)
In-Reply-To: <87zg0pfyux.fsf@yhuang6-desk2.ccr.corp.intel.com>

On 11/10/2023 07:37, Huang, Ying wrote:
> Ryan Roberts <ryan.roberts@arm.com> writes:
> 
> [...]
> 
>> Finally on testing, I've run the mm selftests and see no regressions, but I
>> don't think there is anything in there specifically aimed towards swap? Are
>> there any functional or performance tests that I should run? It would certainly
>> be good to confirm I haven't regressed PMD-size THP swap performance.
> 
> I have used swap sub test case of vm-scalbility to test.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/

I ended up using `usemem`, which is the core of this test suite, but deviated
from the pre-canned test case to allow me to use anonymous memory and get
numbers for small-sized THP (this is a very useful tool - thanks for pointing it
out!)

I've run the tests on Ampere Altra, set up with a 35G block ram device as the
swap device and from inside a memcg limited to 40G memory. I've then run
`usemem` with 70 processes (each has its own core), each allocating and writing
1G of memory. I've repeated everything 5 times and taken the mean and stdev:


Mean Performance Improvement vs 4K/baseline

| alloc size |            baseline |    remove-huge-flag | swap-file-small-thp |
|            |  v6.6-rc4+anonfolio |           + patch 1 |           + patch 2 |
|:-----------|--------------------:|--------------------:|--------------------:|
| 4K Page    |                0.0% |                2.3% |                9.1% |
| 64K THP    |              -44.1% |              -46.3% |               30.6% |
| 2M THP     |               56.0% |               54.2% |               60.1% |


Standard Deviation as Percentage of Mean

| alloc size |            baseline |    remove-huge-flag | swap-file-small-thp |
|            |  v6.6-rc4+anonfolio |           + patch 1 |           + patch 2 |
|:-----------|--------------------:|--------------------:|--------------------:|
| 4K Page    |                3.4% |                7.1% |                1.7% |
| 64K THP    |                1.9% |                5.6% |                7.7% |
| 2M THP     |                1.9% |                2.1% |                3.2% |


I don't see any meaningful performance cost to removing the HUGE flag, so
hopefully this gives us confidence to move forward with patch 1.

You can indeed see the performance regression in the baseline when THP is
configured to allocate small-sized THP only (in this case 64K). And you can see
the regression is fixed by patch 2, which avoids splitting the THP and thus
avoids the extra TLBIs. This correlates with what I saw in kernel compilation
workload.

Huang Ying, based on these results, do you still want me to persue a per-cpu
solution to avoid potential contention on the swap info lock? - I proposed in
the thread against patch 2 to do this in the swap_slots layer if so, rather than
in swapfile.c directly (I'm not sure how your original proposal would actually
work?). But based on these results, its not obvious to me that there is a
definite problem here, and it might be simpler to avoid the complexity?

Thanks,
Ryan

> 
> --
> Best Regards,
> Huang, Ying



      parent reply	other threads:[~2023-10-13 16:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-10 14:21 [RFC PATCH v1 0/2] Swap-out small-sized THP without splitting Ryan Roberts
2023-10-10 14:21 ` [RFC PATCH v1 1/2] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags Ryan Roberts
2023-10-11  7:43   ` Huang, Ying
2023-10-11  8:17   ` Kefeng Wang
2023-10-11 10:15     ` Ryan Roberts
2023-10-11 10:16     ` Ryan Roberts
2023-10-10 14:21 ` [RFC PATCH v1 2/2] mm: swap: Swap-out small-sized THP without splitting Ryan Roberts
2023-10-11  7:44   ` Ryan Roberts
2023-10-11  8:25   ` Huang, Ying
2023-10-11 10:36     ` Ryan Roberts
2023-10-11 17:14       ` Ryan Roberts
2023-10-16  6:17         ` Huang, Ying
2023-10-16 12:10           ` Ryan Roberts
2023-10-17  5:44             ` Huang, Ying
2023-10-11  6:37 ` [RFC PATCH v1 0/2] " Huang, Ying
2023-10-11  7:42   ` Ryan Roberts
2023-10-13 16:31   ` Ryan Roberts [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a9d7103f-efcb-4ab8-91b9-6f1737789c49@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).