All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de,
	mgorman@suse.de, linux-rt-users@vger.kernel.org, vbabka@suse.cz,
	cl@linux.com, paulmck@kernel.org, willy@infradead.org
Subject: Re: [PATCH 2/2] mm/page_alloc: Add remote draining support to per-cpu lists
Date: Tue, 8 Feb 2022 12:47:10 -0300	[thread overview]
Message-ID: <YgKQfsznPUDN34un@fuller.cnet> (raw)
In-Reply-To: <20220208100750.1189808-3-nsaenzju@redhat.com>

On Tue, Feb 08, 2022 at 11:07:50AM +0100, Nicolas Saenz Julienne wrote:
> The page allocator's per-cpu page lists (pcplists) are currently
> protected using local_locks. While performance savvy, this doesn't allow
> for remote access to these structures. CPUs requiring system-wide
> changes to the per-cpu lists get around this by scheduling
> workers on each CPU. That said, some setups like NOHZ_FULL CPUs,
> aren't well suited to this since they can't handle interruptions
> of any sort.
> 
> To mitigate this, replace the current draining mechanism with one that
> allows remotely draining the lists:
> 
>  - Each CPU now has two pcplists pointers: one that points to a pcplists
>    instance that is in-use, 'pcp->lp', another that points to an idle
>    and empty instance, 'pcp->drain'. CPUs access their local pcplists
>    through 'pcp->lp' and the pointer is dereferenced atomically.
> 
>  - When a CPU decides it needs to empty some remote pcplists, it'll
>    atomically exchange the remote CPU's 'pcp->lp' and 'pcp->drain'
>    pointers. A remote CPU racing with this will either have:
> 
>      - An old 'pcp->lp' reference, it'll soon be emptied by the drain
>        process, we just have to wait for it to finish using it.
> 
>      - The new 'pcp->lp' reference, that is, an empty pcplists instance.
>        rcu_replace_pointer()'s release semantics ensures any prior
>        changes will be visible by the remote CPU, for example: changes
>        to 'pcp->high' and 'pcp->batch' when disabling the pcplists.
> 
>  - The CPU that started the drain can now wait for an RCU grace period
>    to make sure the remote CPU is done using the old pcplists.
>    synchronize_rcu() counts as a full memory barrier, so any changes the
>    local CPU makes to the soon to be drained pcplists will be visible to
>    the draining CPU once it returns.
> 
>  - Then the CPU can safely free the old pcplists. Nobody else holds a
>    reference to it. Note that concurrent access to the remote pcplists
>    drain is protected by the 'pcpu_drain_mutex'.
> 
> >From an RCU perspective, we're only protecting access to the pcplists
> pointer, the drain operation is the writer and the local_lock critical
> sections are the readers. RCU guarantees atomicity both while
> dereferencing the pcplists pointer and replacing it. It also checks for
> RCU critical section/locking correctness, as all readers have to hold
> their per-cpu pagesets local_lock, which also counts as a critical
> section from RCU's perspective.
> 
> >From a performance perspective, on production configurations, the patch
> adds an extra dereference to all hot paths (under such circumstances
> rcu_dereference() will simplify to READ_ONCE()). Extensive measurements
> have been performed on different architectures to ascertain the
> performance impact is minimal. Most platforms don't see any difference
> and the worst-case scenario shows a 1-3% degradation on a page
> allocation micro-benchmark. See cover letter for in-depth results.
> 
> Accesses to the pcplists like the ones in mm/vmstat.c don't require RCU
> supervision since they can handle outdated data, but they do use
> rcu_access_pointer() to avoid compiler weirdness make sparse happy.
> 
> Note that special care has been taken to verify there are no races with
> the memory hotplug code paths. Notably with calls to zone_pcp_reset().
> As Mel Gorman explains in a previous patch[1]: "The existing hotplug
> paths guarantees the pcplists cannot be used after zone_pcp_enable()
> [the one in offline_pages()]. That should be the case already because
> all the pages have been freed and there is no page to put on the PCP
> lists."
> 
> All in all, this technique allows for remote draining on all setups with
> an acceptable performance impact. It benefits all sorts of use cases:
> low-latency, real-time, HPC, idle systems, KVM guests.
> 
> [1] 8ca559132a2d ("mm/memory_hotplug: remove broken locking of zone PCP
>     structures during hot remove")
> 
> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> ---
> 
> Changes since RFC:
>  - Avoid unnecessary spin_lock_irqsave/restore() in free_pcppages_bulk()
>  - Add more detail to commit and code comments.
>  - Use synchronize_rcu() instead of synchronize_rcu_expedited(), the RCU
>    documentation says to avoid it unless really justified. I don't think
>    it's justified here, if we can schedule and join works, waiting for
>    an RCU grace period is OK.

https://patchwork.ozlabs.org/project/netdev/patch/1306228052.3026.16.camel@edumazet-laptop/

Adding 100ms to direct reclaim path might be problematic. It will also
slowdown kcompactd (note it'll call drain_all_pages for each zone).

>  - Avoid sparse warnings by using rcu_access_pointer() and
>    rcu_dereference_protected().
> 
>  include/linux/mmzone.h |  22 +++++-
>  mm/page_alloc.c        | 155 ++++++++++++++++++++++++++---------------
>  mm/vmstat.c            |   6 +-
>  3 files changed, 120 insertions(+), 63 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b4cb85d9c6e8..b0b593fd8e48 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -388,13 +388,31 @@ struct per_cpu_pages {
>  	short expire;		/* When 0, remote pagesets are drained */
>  #endif
>  
> -	struct pcplists *lp;
> +	/*
> +	 * As a rule of thumb, any access to struct per_cpu_pages's 'lp' has
> +	 * happen with the pagesets local_lock held and using
> +	 * rcu_dereference_check(). If there is a need to modify both
> +	 * 'lp->count' and 'lp->lists' in the same critical section 'pcp->lp'
> +	 * can only be derefrenced once. See for example:

Typo.


  reply	other threads:[~2022-02-08 16:32 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 10:07 [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support Nicolas Saenz Julienne
2022-02-08 10:07 ` [PATCH 1/2] mm/page_alloc: Access lists in 'struct per_cpu_pages' indirectly Nicolas Saenz Julienne
2022-03-03 14:33   ` Marcelo Tosatti
2022-02-08 10:07 ` [PATCH 2/2] mm/page_alloc: Add remote draining support to per-cpu lists Nicolas Saenz Julienne
2022-02-08 15:47   ` Marcelo Tosatti [this message]
2022-02-15  8:47     ` Nicolas Saenz Julienne
2022-02-15 17:32       ` Paul E. McKenney
2022-02-09  8:55 ` [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support Xiongfeng Wang
2022-02-09  9:45   ` Nicolas Saenz Julienne
2022-02-09 11:26     ` Xiongfeng Wang
2022-02-09 11:36       ` Nicolas Saenz Julienne
2022-02-10 10:59 ` Xiongfeng Wang
2022-02-10 11:04   ` Nicolas Saenz Julienne
2022-03-03 11:45 ` Mel Gorman
2022-03-07 13:57   ` Nicolas Saenz Julienne
2022-03-10 16:31     ` Mel Gorman
2022-03-07 20:47   ` Marcelo Tosatti
2022-03-24 18:59   ` Nicolas Saenz Julienne
2022-03-25 10:48     ` Mel Gorman
2022-03-28 13:51       ` Nicolas Saenz Julienne
2022-03-29  9:45         ` Mel Gorman
2022-03-30 11:29   ` Nicolas Saenz Julienne
2022-03-31 15:24     ` Mel Gorman
2022-03-03 13:27 ` Vlastimil Babka
2022-03-03 14:10   ` Nicolas Saenz Julienne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgKQfsznPUDN34un@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=nsaenzju@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.