All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [RFC PATCH 0/6] Drain remote per-cpu directly v2
Date: Tue, 10 May 2022 11:13:05 -0700	[thread overview]
Message-ID: <YnqrMckyHH3qvkdv@google.com> (raw)
In-Reply-To: <20220510092733.GE3441@techsingularity.net>

On Tue, May 10, 2022 at 10:27:33AM +0100, Mel Gorman wrote:
> On Mon, May 09, 2022 at 08:58:51AM -0700, Minchan Kim wrote:
> > On Mon, May 09, 2022 at 02:07:59PM +0100, Mel Gorman wrote:
> > > Changelog since v1
> > > o Fix unsafe RT locking scheme
> > > o Use spin_trylock on UP PREEMPT_RT
> > 
> > Mel,
> > 
> > 
> > Is this only change from previous last version which has some
> > delta you fixed based on Vlastimil and me?
> > 
> 
> Full diff is below although it can also be generated by
> comparing the mm-pcpdrain-v1r8..mm-pcpdrain-v2r1 branches in
> https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/

I took the delta when I started testing so the testing result
would be valid. Thanks.

> 
> > And is it still RFC?
> > 
> 
> It's still RFC because it's a different approach to Nicolas' series and
> I want at least his Acked-by before taking the RFC label out and sending
> it to Andrew.
> 
> > Do you have some benchmark data?
> > 
> 
> Yes, but as reclaim is not fundamentally altered the main difference
> in behavious is that work is done inline instead of being deferred to a
> workqueue. That means in some cases, system CPU usage of a task will be
> higher because it's paying the cost directly.

Sure but the reclaim path is already expensive so I doubt we could
see the sizable measurement on the system CPU usage.

What I wanted to see was whether we have regression due to adding
spin_lock/unlock instructions in hot path. Due to squeeze it to
a cacheline, I expected the regression would be just marginal.

> 
> The workloads I used just hit reclaim directly to make sure it's
> functionally not broken. There is no change in page aging decisions,
> only timing of drains. I didn't check interference of a heavy workload
> interfering with a CPU-bound workload running on NOHZ CPUs as I assumed
> both you and Nicolas had a test case ready to use.

The my workload is not NOHZ CPUs but run apps under heavy memory
pressure so they goes to direct reclaim and be stuck on drain_all_pages
until work on workqueue run.

unit: nanosecond
max(dur)        avg(dur)                count(dur)
166713013       487511.77786438033      1283

From traces, system encountered the drain_all_pages 1283 times and
worst case was 166ms and avg was 487us.

The other problem was alloc_contig_range in CMA. The PCP draining
takes several hundred millisecond sometimes though there is no
memory pressure or a few of pages to be migrated out but CPU were
fully booked.

Your patch perfectly removed those wasted time.

> 
> The main one I paid interest to was a fault latency benchmark in
> the presense of heavy reclaim called stutterp. It simulates a simple
> workload. One part uses a lot of anonymous memory, a second measures mmap
> latency and a third copies a large file.  The primary metric is checking
> for mmap latency. It was originally put together to debug interactivity
> issues on a desktop in the presense of heavy IO where the desktop
> applications were being pushed to backing storage.
> 
> stutterp
>                               5.18.0-rc1             5.18.0-rc1
>                                  vanilla       mm-pcpdrain-v2r1
> 1st-qrtle mmap-4      15.9557 (   0.00%)     15.4045 (   3.45%)
> 1st-qrtle mmap-6      10.8025 (   0.00%)     11.1204 (  -2.94%)
> 1st-qrtle mmap-8      16.9338 (   0.00%)     17.0595 (  -0.74%)
> 1st-qrtle mmap-12     41.4746 (   0.00%)      9.4003 (  77.33%)
> 1st-qrtle mmap-18     47.7123 (   0.00%)    100.0275 (-109.65%)
> 1st-qrtle mmap-24     17.7098 (   0.00%)     16.9633 (   4.22%)
> 1st-qrtle mmap-30     69.2565 (   0.00%)     38.2205 (  44.81%)
> 1st-qrtle mmap-32     49.1295 (   0.00%)     46.8819 (   4.57%)
> 3rd-qrtle mmap-4      18.4706 (   0.00%)     17.4799 (   5.36%)
> 3rd-qrtle mmap-6      11.4526 (   0.00%)     11.5567 (  -0.91%)
> 3rd-qrtle mmap-8      19.5903 (   0.00%)     19.0046 (   2.99%)
> 3rd-qrtle mmap-12     50.3095 (   0.00%)     25.3254 (  49.66%)
> 3rd-qrtle mmap-18     67.3319 (   0.00%)    147.6404 (-119.27%)
> 3rd-qrtle mmap-24     41.3779 (   0.00%)     84.4035 (-103.98%)
> 3rd-qrtle mmap-30    127.1375 (   0.00%)    148.8884 ( -17.11%)
> 3rd-qrtle mmap-32     79.7423 (   0.00%)    182.3042 (-128.62%)
> Max-99    mmap-4      46.9123 (   0.00%)     49.7731 (  -6.10%)
> Max-99    mmap-6      42.5414 (   0.00%)     16.6173 (  60.94%)
> Max-99    mmap-8      43.1237 (   0.00%)     23.3478 (  45.86%)
> Max-99    mmap-12     62.8025 (   0.00%)   1947.3862 (-3000.81%)
> Max-99    mmap-18  27936.8695 (   0.00%)    232.7122 (  99.17%)
> Max-99    mmap-24 204543.9436 (   0.00%)   5805.2478 (  97.16%)
> Max-99    mmap-30   2350.0289 (   0.00%)  10300.6344 (-338.32%)
> Max-99    mmap-32  56164.2271 (   0.00%)   7789.7526 (  86.13%)
> Max       mmap-4     840.3468 (   0.00%)   1137.4462 ( -35.35%)
> Max       mmap-6  255233.3996 (   0.00%)  91304.0952 (  64.23%)
> Max       mmap-8  210910.6497 (   0.00%) 117931.0796 (  44.08%)
> Max       mmap-12 108268.9537 (   0.00%) 319971.6910 (-195.53%)
> Max       mmap-18 608805.3195 (   0.00%) 197483.2205 (  67.56%)
> Max       mmap-24 327697.5605 (   0.00%) 382842.5356 ( -16.83%)
> Max       mmap-30 688684.5335 (   0.00%) 669992.7705 (   2.71%)
> Max       mmap-32 396842.0114 (   0.00%) 415978.2539 (  -4.82%)
> 
>                   5.18.0-rc1  5.18.0-rc1
>                      vanillamm-pcpdrain-v2r1
> Duration User        1438.08     1637.21
> Duration System     12267.41    10307.96
> Duration Elapsed     3929.70     3443.53
> 
> 
> It's a mixed bag but this workload is always a mixed bag and it's stressing
> reclaim.  At some points, latencies are worse, in others better. Overall,
> it completed faster and this was on a 1-socket machine.
> 
> On a 2-socket machine, the overall completions times were worse
> 
>                   5.18.0-rc1  5.18.0-rc1
>                      vanillamm-pcpdrain-v2r1
> Duration User        3713.75     2899.90
> Duration System    303507.56   378909.94
> Duration Elapsed    15444.59    19067.40
> 
> In general this type of workload is variable given the nature of what it
> does and can give different results on each execution. When originally
> designed, it was to deal with stalls lasting several seconds to reduce
> them to the sub-millisecond range.
> 
> The intent of the series is switching out-of-line work to in-line so
> what it should be measuring is interference effects and not straight-line
> performance and I haven't written a specific test case yet. When writing
> the series initially, it was to investigate if the PCP could be lockless
> and failing that, if disabling IRQs could be avoided in the common case.
> It just turned out that part of that made remote draining possible and
> I focused closer on that because it's more important.
> 
> > I'd like to give Acked-by/Tested-by(even though there are a few
> > more places to align with new fields name in 1/6)
> 
> Which ones are of concern?
> 
> Some of the page->lru references I left alone in the init paths simply
> because in those contexts, the page wasn't on a buddy or PCP list. In
> free_unref_page_list the page is not on the LRU, it's just been isolated
> from the LRU. In alloc_pages_bulk, it's not on a buddy, pcp or LRU list
> and is just a list placeholder so I left it alone. In
> free_tail_pages_check the context was a page that was likely previously
> on a LRU.

Just nits: all are list macros.

free_pcppages_bulk's list_last_entry should be pcp_list.

mark_free_pages's list_for_each_entry should be buddy_list

__rmqueue_pcplist's list_first_enty should be pcp_list.

> 
> > since I have
> > tested these patchset in my workload and didn't spot any other
> > problems.
> > 
> 
> Can you describe this workload, is it available anywhere and does it
> require Android to execute?

I wrote down above. It runs on Android but I don't think it's
android specific issue but anyone could see such a long latency
from PCP draining once one of cores are monopolized by higher
priority processes or too many pending kworks.

> 
> If you have positive results, it would be appreciated if you could post
> them or just note in a Tested-by/Acked-by that it had a measurable impact
> on the reclaim/cma path.

Sure.

All patches in this series.

Tested-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>

Thanks.

  reply	other threads:[~2022-05-10 18:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-09 13:07 [RFC PATCH 0/6] Drain remote per-cpu directly v2 Mel Gorman
2022-05-09 13:08 ` [PATCH 1/6] mm/page_alloc: Add page->buddy_list and page->pcp_list Mel Gorman
2022-05-13  8:41   ` Muchun Song
2022-05-26 10:14     ` Mel Gorman
2022-05-09 13:08 ` [PATCH 2/6] mm/page_alloc: Use only one PCP list for THP-sized allocations Mel Gorman
2022-05-09 13:08 ` [PATCH 3/6] mm/page_alloc: Split out buddy removal code from rmqueue into separate helper Mel Gorman
2022-05-09 13:08 ` [PATCH 4/6] mm/page_alloc: Remove unnecessary page == NULL check in rmqueue Mel Gorman
2022-05-09 13:08 ` [PATCH 5/6] mm/page_alloc: Protect PCP lists with a spinlock Mel Gorman
2022-05-22  2:49   ` Hugh Dickins
2022-05-24 12:12     ` Mel Gorman
2022-05-24 12:19       ` Mel Gorman
2022-05-09 13:08 ` [PATCH 6/6] mm/page_alloc: Remotely drain per-cpu lists Mel Gorman
2022-05-09 15:58 ` [RFC PATCH 0/6] Drain remote per-cpu directly v2 Minchan Kim
2022-05-10  9:27   ` Mel Gorman
2022-05-10 18:13     ` Minchan Kim [this message]
2022-05-11 12:47       ` Mel Gorman
2022-05-11 17:20         ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnqrMckyHH3qvkdv@google.com \
    --to=minchan@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.