All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Michal Hocko <mhocko@kernel.org>, Hugh Dickins <hughd@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 5/7] mm/page_alloc: Protect PCP lists with a spinlock
Date: Thu, 16 Jun 2022 17:59:34 +0200	[thread overview]
Message-ID: <af8f5c7d-5f7f-17ad-835e-1fdea2d8dda8@suse.cz> (raw)
In-Reply-To: <20220613125622.18628-6-mgorman@techsingularity.net>

On 6/13/22 14:56, Mel Gorman wrote:
> Currently the PCP lists are protected by using local_lock_irqsave to
> prevent migration and IRQ reentrancy but this is inconvenient.  Remote
> draining of the lists is impossible and a workqueue is required and every
> task allocation/free must disable then enable interrupts which is
> expensive.
> 
> As preparation for dealing with both of those problems, protect the lists
> with a spinlock.  The IRQ-unsafe version of the lock is used because IRQs
> are already disabled by local_lock_irqsave.  spin_trylock is used in
> preparation for a time when local_lock could be used instead of
> lock_lock_irqsave.
> 
> The per_cpu_pages still fits within the same number of cache lines after
> this patch relative to before the series.
> 
> struct per_cpu_pages {
>         spinlock_t                 lock;                 /*     0     4 */
>         int                        count;                /*     4     4 */
>         int                        high;                 /*     8     4 */
>         int                        batch;                /*    12     4 */
>         short int                  free_factor;          /*    16     2 */
>         short int                  expire;               /*    18     2 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         struct list_head           lists[13];            /*    24   208 */
> 
>         /* size: 256, cachelines: 4, members: 7 */
>         /* sum members: 228, holes: 1, sum holes: 4 */
>         /* padding: 24 */
> } __attribute__((__aligned__(64)));
> 
> There is overhead in the fast path due to acquiring the spinlock even
> though the spinlock is per-cpu and uncontended in the common case.  Page
> Fault Test (PFT) running on a 1-socket reported the following results on a
> 1 socket machine.
> 
>                                      5.18.0-rc1               5.18.0-rc1
>                                         vanilla         mm-pcpdrain-v2r1
> Hmean     faults/sec-1   886331.5718 (   0.00%)   885462.7479 (  -0.10%)
> Hmean     faults/sec-3  2337706.1583 (   0.00%)  2332130.4909 *  -0.24%*
> Hmean     faults/sec-5  2851594.2897 (   0.00%)  2844123.9307 (  -0.26%)
> Hmean     faults/sec-7  3543251.5507 (   0.00%)  3516889.0442 *  -0.74%*
> Hmean     faults/sec-8  3947098.0024 (   0.00%)  3916162.8476 *  -0.78%*
> Stddev    faults/sec-1     2302.9105 (   0.00%)     2065.0845 (  10.33%)
> Stddev    faults/sec-3     7275.2442 (   0.00%)     6033.2620 (  17.07%)
> Stddev    faults/sec-5    24726.0328 (   0.00%)    12525.1026 (  49.34%)
> Stddev    faults/sec-7     9974.2542 (   0.00%)     9543.9627 (   4.31%)
> Stddev    faults/sec-8     9468.0191 (   0.00%)     7958.2607 (  15.95%)
> CoeffVar  faults/sec-1        0.2598 (   0.00%)        0.2332 (  10.24%)
> CoeffVar  faults/sec-3        0.3112 (   0.00%)        0.2587 (  16.87%)
> CoeffVar  faults/sec-5        0.8670 (   0.00%)        0.4404 (  49.21%)
> CoeffVar  faults/sec-7        0.2815 (   0.00%)        0.2714 (   3.60%)
> CoeffVar  faults/sec-8        0.2399 (   0.00%)        0.2032 (  15.28%)
> 
> There is a small hit in the number of faults per second but given that the
> results are more stable, it's borderline noise.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Tested-by: Minchan Kim <minchan@kernel.org>
> Acked-by: Minchan Kim <minchan@kernel.org>
> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

With comments:

> -static void free_unref_page_commit(struct page *page, int migratetype,
> -				   unsigned int order)
> +/* Returns true if the page was committed to the per-cpu list. */
> +static bool free_unref_page_commit(struct page *page, int migratetype,
> +				   unsigned int order, bool locked)

'bool locked' schemes are frowned upon. Although this does some preparatory
work outside of the locked section...

>  {
>  	struct zone *zone = page_zone(page);
>  	struct per_cpu_pages *pcp;
>  	int high;
>  	int pindex;
>  	bool free_high;
> +	unsigned long __maybe_unused UP_flags;
>  
>  	__count_vm_event(PGFREE);
>  	pcp = this_cpu_ptr(zone->per_cpu_pageset);
>  	pindex = order_to_pindex(migratetype, order);

... maybe it would be simpler to just do the locking always outside? We
aren't expecting any contention anyway except if draining is in progress,
and thus making the lock hold time a bunch of instructions longer won't make
any difference?

That said I don't know yet how patches 6 and 7 change this...

[...]

> @@ -3465,10 +3510,19 @@ void free_unref_page(struct page *page, unsigned int order)
>  void free_unref_page_list(struct list_head *list)
>  {
>  	struct page *page, *next;
> +	struct per_cpu_pages *pcp;
> +	struct zone *locked_zone;
>  	unsigned long flags;
>  	int batch_count = 0;
>  	int migratetype;
>  
> +	/*
> +	 * An empty list is possible. Check early so that the later
> +	 * lru_to_page() does not potentially read garbage.
> +	 */ 
> +	if (list_empty(list))
> +		return;

There's another check below and list_for_each_entry_safe() is fine with an
empty list, so this one seems unnecessary?

>  	/* Prepare pages for freeing */
>  	list_for_each_entry_safe(page, next, list, lru) {
>  		unsigned long pfn = page_to_pfn(page);
> @@ -3489,8 +3543,31 @@ void free_unref_page_list(struct list_head *list)
>  		}
>  	}
>  
> +	/*
> +	 * Preparation could have drained the list due to failing to prepare
> +	 * or all pages are being isolated.
> +	 */
> +	if (list_empty(list))
> +		return;
> +
>  	local_lock_irqsave(&pagesets.lock, flags);
> +
> +	page = lru_to_page(list);

  reply	other threads:[~2022-06-16 15:59 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-13 12:56 [PATCH v4 00/7] Drain remote per-cpu directly Mel Gorman
2022-06-13 12:56 ` [PATCH 1/7] mm/page_alloc: Add page->buddy_list and page->pcp_list Mel Gorman
2022-06-13 12:56 ` [PATCH 2/7] mm/page_alloc: Use only one PCP list for THP-sized allocations Mel Gorman
2022-06-13 12:56 ` [PATCH 3/7] mm/page_alloc: Split out buddy removal code from rmqueue into separate helper Mel Gorman
2022-06-13 12:56 ` [PATCH 4/7] mm/page_alloc: Remove mistaken page == NULL check in rmqueue Mel Gorman
2022-06-13 12:56 ` [PATCH 5/7] mm/page_alloc: Protect PCP lists with a spinlock Mel Gorman
2022-06-16 15:59   ` Vlastimil Babka [this message]
2022-06-13 12:56 ` [PATCH 6/7] mm/page_alloc: Remotely drain per-cpu lists Mel Gorman
2022-06-16 16:41   ` Vlastimil Babka
2022-06-13 12:56 ` [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock Mel Gorman
2022-06-15 22:43   ` Yu Zhao
     [not found]   ` <CGME20220615224855eucas1p1ea6d90c23ec9423dfe04b267f6dddd2a@eucas1p1.samsung.com>
2022-06-15 22:48     ` Marek Szyprowski
2022-06-15 23:04       ` Andrew Morton
2022-06-16  3:05         ` Yu Zhao
2022-06-17  7:55           ` Vlastimil Babka
2022-06-17  6:47         ` Marek Szyprowski
2022-06-21  9:21         ` Mel Gorman
2022-06-16 17:01   ` Vlastimil Babka
2022-06-16 21:07     ` Yu Zhao
2022-06-17  7:57       ` Vlastimil Babka
2022-06-21  9:27         ` Mel Gorman
2022-06-21  9:26     ` Mel Gorman
2022-06-17  9:39   ` Nicolas Saenz Julienne
2022-06-21  9:29     ` Mel Gorman
2022-06-21  9:31       ` Nicolas Saenz Julienne
2022-07-03  9:44   ` [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c kernel test robot
2022-07-03  9:44     ` kernel test robot
2022-07-03 20:22     ` Andrew Morton
2022-07-03 20:22       ` Andrew Morton
2022-07-05 13:51       ` Oliver Sang
2022-07-05 13:51         ` Oliver Sang
2022-07-06  9:55         ` Mel Gorman
2022-07-06  9:55           ` Mel Gorman
2022-07-06 11:53           ` Mel Gorman
2022-07-06 11:53             ` Mel Gorman
2022-07-06 14:21             ` Oliver Sang
2022-07-06 14:21               ` Oliver Sang
2022-07-06 14:52               ` Mel Gorman
2022-07-06 14:52                 ` Mel Gorman
2022-07-07  8:22                 ` Oliver Sang
2022-07-07  8:22                   ` Oliver Sang
2022-07-06 14:25           ` Oliver Sang
2022-07-06 14:25             ` Oliver Sang
2022-07-06 14:53             ` Mel Gorman
2022-07-06 14:53               ` Mel Gorman
2022-07-07 21:55         ` Vlastimil Babka
2022-07-07 21:55           ` Vlastimil Babka
2022-07-08 10:56           ` Mel Gorman
2022-07-08 10:56             ` Mel Gorman
2022-07-12  5:04             ` Oliver Sang
2022-07-12  5:04               ` Oliver Sang
2022-06-24 12:54 [PATCH v5 00/7] Drain remote per-cpu directly Mel Gorman
2022-06-24 12:54 ` [PATCH 5/7] mm/page_alloc: Protect PCP lists with a spinlock Mel Gorman
2022-07-04 12:31   ` Vlastimil Babka
2022-07-05  7:20     ` Mel Gorman
2022-07-04 16:32   ` Nicolas Saenz Julienne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af8f5c7d-5f7f-17ad-835e-1fdea2d8dda8@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nsaenzju@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.