linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Michal Hocko <mhocko@kernel.org>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists
Date: Thu, 3 Jun 2021 13:12:10 +0200	[thread overview]
Message-ID: <1c15b083-26f8-4473-80e6-bcc2f549ba41@suse.cz> (raw)
In-Reply-To: <20210603084621.24109-3-mgorman@techsingularity.net>

On 6/3/21 10:46 AM, Mel Gorman wrote:
> The per-cpu page allocator (PCP) only stores order-0 pages. This means
> that all THP and "cheap" high-order allocations including SLUB contends
> on the zone->lock. This patch extends the PCP allocator to store THP and
> "cheap" high-order pages. Note that struct per_cpu_pages increases in
> size to 256 bytes (4 cache lines) on x86-64.
> 
> Note that this is not necessarily a universal performance win because of
> how it is implemented. High-order pages can cause pcp->high to be exceeded
> prematurely for lower-orders so for example, a large number of THP pages
> being freed could release order-0 pages from the PCP lists. Hence, much
> depends on the allocation/free pattern as observed by a single CPU to
> determine if caching helps or hurts a particular workload.
> 
> That said, basic performance testing passed. The following is a netperf
> UDP_STREAM test which hits the relevant patches as some of the network
> allocations are high-order.
> 
> netperf-udp
>                                  5.13.0-rc2             5.13.0-rc2
>                            mm-pcpburst-v3r4   mm-pcphighorder-v1r7
> Hmean     send-64         261.46 (   0.00%)      266.30 *   1.85%*
> Hmean     send-128        516.35 (   0.00%)      536.78 *   3.96%*
> Hmean     send-256       1014.13 (   0.00%)     1034.63 *   2.02%*
> Hmean     send-1024      3907.65 (   0.00%)     4046.11 *   3.54%*
> Hmean     send-2048      7492.93 (   0.00%)     7754.85 *   3.50%*
> Hmean     send-3312     11410.04 (   0.00%)    11772.32 *   3.18%*
> Hmean     send-4096     13521.95 (   0.00%)    13912.34 *   2.89%*
> Hmean     send-8192     21660.50 (   0.00%)    22730.72 *   4.94%*
> Hmean     send-16384    31902.32 (   0.00%)    32637.50 *   2.30%*
> 
> From a functional point of view, a patch like this is necessary to
> make bulk allocation of high-order pages work with similar performance
> to order-0 bulk allocations. The bulk allocator is not updated in this
> series as it would have to be determined by bulk allocation users how
> they want to track the order of pages allocated with the bulk allocator.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Some comments below.

> ---
>  include/linux/mmzone.h |  20 +++++-
>  mm/internal.h          |   2 +-
>  mm/page_alloc.c        | 159 +++++++++++++++++++++++++++++------------
>  mm/swap.c              |   2 +-
>  4 files changed, 135 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 0ed61f32d898..1ceaa5f44db6 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -333,6 +333,24 @@ enum zone_watermarks {
>  	NR_WMARK
>  };
>  
> +/*
> + * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER plus one additional
> + * for pageblock size for THP if configured.
> + */
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define NR_PCP_THP 1
> +#else
> +#define NR_PCP_THP 0
> +#endif
> +#define NR_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1 + NR_PCP_THP))
> +
> +/*
> + * Shift to encode migratetype in order in the least significant bits and
> + * migratetype in the higher bits.

Hard for me to understand that comment. I would describe what the code does as
e,g, "Shift to encode migratetype and order in the same integer, with order in
the least significant bit ..." etc.

> + */
> +#define NR_PCP_ORDER_SHIFT 8

Also ORDER_SHIFT is a bit misnomer, it's more precisely an ORDER_WIDTH, and we
are shifting migratetype with it, not order. I'm just comparing with how we name
nid/zid/etc bits in page flags.

> +#define NR_PCP_ORDER_MASK ((1<<NR_PCP_ORDER_SHIFT) - 1)
> +
>  #define min_wmark_pages(z) (z->_watermark[WMARK_MIN] + z->watermark_boost)
>  #define low_wmark_pages(z) (z->_watermark[WMARK_LOW] + z->watermark_boost)
>  #define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boost)
> @@ -349,7 +367,7 @@ struct per_cpu_pages {
>  #endif
>  
>  	/* Lists of pages, one per migrate type stored on the pcp-lists */
> -	struct list_head lists[MIGRATE_PCPTYPES];
> +	struct list_head lists[NR_PCP_LISTS];
>  };
>  
>  struct per_cpu_zonestat {
> diff --git a/mm/internal.h b/mm/internal.h
> index 8fd61e344966..4f5c22dd8987 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -203,7 +203,7 @@ extern void post_alloc_hook(struct page *page, unsigned int order,
>  					gfp_t gfp_flags);
>  extern int user_min_free_kbytes;
>  
> -extern void free_unref_page(struct page *page);
> +extern void free_unref_page(struct page *page, unsigned int order);
>  extern void free_unref_page_list(struct list_head *list);
>  
>  extern void zone_pcp_update(struct zone *zone, int cpu_online);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 99ddac0ffece..ffd2d07060eb 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -687,10 +687,53 @@ static void bad_page(struct page *page, const char *reason)
>  	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>  }
>  
> +static inline unsigned int order_to_pindex(int migratetype, int order)
> +{
> +	int base = order;
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	if (order > PAGE_ALLOC_COSTLY_ORDER) {
> +		VM_BUG_ON(order != pageblock_order);
> +		base = PAGE_ALLOC_COSTLY_ORDER + 1;
> +	}
> +#else
> +	VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
> +#endif
> +
> +	return (MIGRATE_PCPTYPES * base) + migratetype;
> +}
> +
> +static inline int pindex_to_order(unsigned int pindex)
> +{
> +	int order = pindex / PAGE_ALLOC_COSTLY_ORDER;

This seems wrong, shouldn't we divide by MIGRATE_PCPTYPES?
It just happens to be the same number, so testing won't flag this.




  reply	other threads:[~2021-06-03 11:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03  8:46 [PATCH 0/2] Allow high order pages to be stored on PCP Mel Gorman
2021-06-03  8:46 ` [PATCH 1/2] mm/page_alloc: Move free_the_page Mel Gorman
2021-06-03 11:12   ` Vlastimil Babka
2021-06-03  8:46 ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Mel Gorman
2021-06-03 11:12   ` Vlastimil Babka [this message]
2021-06-03 12:34     ` Mel Gorman
2021-06-03 13:04       ` Vlastimil Babka
  -- strict thread matches above, loose matches on Subject: below --
2021-06-03 14:22 [PATCH 0/2] Allow high order pages to be stored on PCP v2 Mel Gorman
2021-06-03 14:22 ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Mel Gorman
2021-06-09 18:30   ` Zi Yan
2021-06-10 11:18     ` Mel Gorman
2021-06-10 11:40       ` Zi Yan
2021-06-10 22:59         ` Andrew Morton
2021-06-11  0:38           ` Stephen Rothwell
2021-06-11  8:10         ` Mel Gorman
2021-06-11  8:34         ` Mel Gorman
2021-06-11 12:17           ` Zi Yan
2021-06-11 13:58             ` Mel Gorman
2021-05-31 12:04 [RFC PATCH 0/2] Allow high order pages to be stored on PCP Mel Gorman
2021-05-31 12:04 ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Mel Gorman
2021-05-31 15:23   ` Jesper Dangaard Brouer
2021-06-01 12:45     ` Mel Gorman
2021-06-02 13:53       ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1c15b083-26f8-4473-80e6-bcc2f549ba41@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).