All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Michal Hocko <mhocko@kernel.org>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists
Date: Thu, 3 Jun 2021 13:12:10 +0200	[thread overview]
Message-ID: <1c15b083-26f8-4473-80e6-bcc2f549ba41@suse.cz> (raw)
In-Reply-To: <20210603084621.24109-3-mgorman@techsingularity.net>

On 6/3/21 10:46 AM, Mel Gorman wrote:
> The per-cpu page allocator (PCP) only stores order-0 pages. This means
> that all THP and "cheap" high-order allocations including SLUB contends
> on the zone->lock. This patch extends the PCP allocator to store THP and
> "cheap" high-order pages. Note that struct per_cpu_pages increases in
> size to 256 bytes (4 cache lines) on x86-64.
> 
> Note that this is not necessarily a universal performance win because of
> how it is implemented. High-order pages can cause pcp->high to be exceeded
> prematurely for lower-orders so for example, a large number of THP pages
> being freed could release order-0 pages from the PCP lists. Hence, much
> depends on the allocation/free pattern as observed by a single CPU to
> determine if caching helps or hurts a particular workload.
> 
> That said, basic performance testing passed. The following is a netperf
> UDP_STREAM test which hits the relevant patches as some of the network
> allocations are high-order.
> 
> netperf-udp
>                                  5.13.0-rc2             5.13.0-rc2
>                            mm-pcpburst-v3r4   mm-pcphighorder-v1r7
> Hmean     send-64         261.46 (   0.00%)      266.30 *   1.85%*
> Hmean     send-128        516.35 (   0.00%)      536.78 *   3.96%*
> Hmean     send-256       1014.13 (   0.00%)     1034.63 *   2.02%*
> Hmean     send-1024      3907.65 (   0.00%)     4046.11 *   3.54%*
> Hmean     send-2048      7492.93 (   0.00%)     7754.85 *   3.50%*
> Hmean     send-3312     11410.04 (   0.00%)    11772.32 *   3.18%*
> Hmean     send-4096     13521.95 (   0.00%)    13912.34 *   2.89%*
> Hmean     send-8192     21660.50 (   0.00%)    22730.72 *   4.94%*
> Hmean     send-16384    31902.32 (   0.00%)    32637.50 *   2.30%*
> 
> From a functional point of view, a patch like this is necessary to
> make bulk allocation of high-order pages work with similar performance
> to order-0 bulk allocations. The bulk allocator is not updated in this
> series as it would have to be determined by bulk allocation users how
> they want to track the order of pages allocated with the bulk allocator.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Some comments below.

> ---
>  include/linux/mmzone.h |  20 +++++-
>  mm/internal.h          |   2 +-
>  mm/page_alloc.c        | 159 +++++++++++++++++++++++++++++------------
>  mm/swap.c              |   2 +-
>  4 files changed, 135 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 0ed61f32d898..1ceaa5f44db6 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -333,6 +333,24 @@ enum zone_watermarks {
>  	NR_WMARK
>  };
>  
> +/*
> + * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER plus one additional
> + * for pageblock size for THP if configured.
> + */
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define NR_PCP_THP 1
> +#else
> +#define NR_PCP_THP 0
> +#endif
> +#define NR_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1 + NR_PCP_THP))
> +
> +/*
> + * Shift to encode migratetype in order in the least significant bits and
> + * migratetype in the higher bits.

Hard for me to understand that comment. I would describe what the code does as
e,g, "Shift to encode migratetype and order in the same integer, with order in
the least significant bit ..." etc.

> + */
> +#define NR_PCP_ORDER_SHIFT 8

Also ORDER_SHIFT is a bit misnomer, it's more precisely an ORDER_WIDTH, and we
are shifting migratetype with it, not order. I'm just comparing with how we name
nid/zid/etc bits in page flags.

> +#define NR_PCP_ORDER_MASK ((1<<NR_PCP_ORDER_SHIFT) - 1)
> +
>  #define min_wmark_pages(z) (z->_watermark[WMARK_MIN] + z->watermark_boost)
>  #define low_wmark_pages(z) (z->_watermark[WMARK_LOW] + z->watermark_boost)
>  #define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boost)
> @@ -349,7 +367,7 @@ struct per_cpu_pages {
>  #endif
>  
>  	/* Lists of pages, one per migrate type stored on the pcp-lists */
> -	struct list_head lists[MIGRATE_PCPTYPES];
> +	struct list_head lists[NR_PCP_LISTS];
>  };
>  
>  struct per_cpu_zonestat {
> diff --git a/mm/internal.h b/mm/internal.h
> index 8fd61e344966..4f5c22dd8987 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -203,7 +203,7 @@ extern void post_alloc_hook(struct page *page, unsigned int order,
>  					gfp_t gfp_flags);
>  extern int user_min_free_kbytes;
>  
> -extern void free_unref_page(struct page *page);
> +extern void free_unref_page(struct page *page, unsigned int order);
>  extern void free_unref_page_list(struct list_head *list);
>  
>  extern void zone_pcp_update(struct zone *zone, int cpu_online);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 99ddac0ffece..ffd2d07060eb 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -687,10 +687,53 @@ static void bad_page(struct page *page, const char *reason)
>  	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>  }
>  
> +static inline unsigned int order_to_pindex(int migratetype, int order)
> +{
> +	int base = order;
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	if (order > PAGE_ALLOC_COSTLY_ORDER) {
> +		VM_BUG_ON(order != pageblock_order);
> +		base = PAGE_ALLOC_COSTLY_ORDER + 1;
> +	}
> +#else
> +	VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
> +#endif
> +
> +	return (MIGRATE_PCPTYPES * base) + migratetype;
> +}
> +
> +static inline int pindex_to_order(unsigned int pindex)
> +{
> +	int order = pindex / PAGE_ALLOC_COSTLY_ORDER;

This seems wrong, shouldn't we divide by MIGRATE_PCPTYPES?
It just happens to be the same number, so testing won't flag this.




  reply	other threads:[~2021-06-03 11:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03  8:46 [PATCH 0/2] Allow high order pages to be stored on PCP Mel Gorman
2021-06-03  8:46 ` [PATCH 1/2] mm/page_alloc: Move free_the_page Mel Gorman
2021-06-03 11:12   ` Vlastimil Babka
2021-06-03  8:46 ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Mel Gorman
2021-06-03 11:12   ` Vlastimil Babka [this message]
2021-06-03 12:34     ` Mel Gorman
2021-06-03 13:04       ` Vlastimil Babka
  -- strict thread matches above, loose matches on Subject: below --
2021-06-03 14:22 [PATCH 0/2] Allow high order pages to be stored on PCP v2 Mel Gorman
2021-06-03 14:22 ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Mel Gorman
2021-06-09 18:30   ` Zi Yan
2021-06-10 11:18     ` Mel Gorman
2021-06-10 11:40       ` Zi Yan
2021-06-10 22:59         ` Andrew Morton
2021-06-11  0:38           ` Stephen Rothwell
2021-06-11  8:10         ` Mel Gorman
2021-06-11  8:34         ` Mel Gorman
2021-06-11 12:17           ` Zi Yan
2021-06-11 13:58             ` Mel Gorman
2021-05-31 12:04 [RFC PATCH 0/2] Allow high order pages to be stored on PCP Mel Gorman
2021-05-31 12:04 ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Mel Gorman
2021-05-31 15:23   ` Jesper Dangaard Brouer
2021-06-01 12:45     ` Mel Gorman
2021-06-02 13:53       ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1c15b083-26f8-4473-80e6-bcc2f549ba41@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.