linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages
Date: Thu, 26 Sep 2019 12:03:37 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1909261149380.39830@chino.kir.corp.google.com> (raw)
In-Reply-To: <20190925070817.GH23050@dhcp22.suse.cz>

On Wed, 25 Sep 2019, Michal Hocko wrote:

> I am especially interested about this part. The more I think about this
> the more I am convinced that the underlying problem really is in the pre
> mature fallback in the fast path.

I appreciate you taking the time to continue to look at this but I'm 
confused about the underlying problem you're referring to: we had no 
underlying problem until 5.3 was released so we need to carry patches that 
will revert this behavior (we simply can't tolerate double digit memory 
access latency regressions lol).

If you're referring to post-5.3 behavior, this appears to override 
alloc_hugepage_direct_gfpmask() but directly in the page allocator.

static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr)
{
...
        /*
         * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not
         * specified, to express a general desire to stay on the current

Your patch is setting __GFP_THISNODE for __GFP_DIRECT_RECLAIM: this 
allocation will fail in the fastpath for both my case (fragmented local 
node) and Andrea's case (out of memory local node).  The first 
get_page_from_freelist() will then succeed in the slowpath for both cases; 
compaction is not tried for either.

In my case, that results in a perpetual remote access latency that we 
can't tolerate.  If Andrea's remote nodes are fragmented or low on memory, 
his case encounters swap storms over both the local node and remote nodes.

So I'm not really sure what is solved by your patch?

We're not on 5.3, but I can try it and collect data on exactly how poorly 
it performs on fragmented *hosts* (not single nodes, but really the whole 
system) because swap storms were never fixed here, it was only papered 
over.

> Does the almost-patch below helps your
> workload? It effectively reduces the fast path for higher order
> allocations to the local/requested node. The justification is that
> watermark check might be too strict for those requests as it is primary
> order-0 oriented. Low watermark target simply has no meaning for the
> higher order requests AFAIU. The min-low gap is giving kswapd a chance
> to balance and be more local node friendly while we do not have anything
> like that in compaction.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ff5484fdbdf9..09036cf55fca 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4685,7 +4685,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  {
>  	struct page *page;
>  	unsigned int alloc_flags = ALLOC_WMARK_LOW;
> -	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> +	gfp_t fastpath_mask, alloc_mask; /* The gfp_t that was actually used for allocation */
>  	struct alloc_context ac = { };
>  
>  	/*
> @@ -4698,7 +4698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	}
>  
>  	gfp_mask &= gfp_allowed_mask;
> -	alloc_mask = gfp_mask;
> +	fastpath_mask = alloc_mask = gfp_mask;
>  	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
>  		return NULL;
>  
> @@ -4710,8 +4710,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	 */
>  	alloc_flags |= alloc_flags_nofragment(ac.preferred_zoneref->zone, gfp_mask);
>  
> -	/* First allocation attempt */
> -	page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
> +	/*
> +	 * First allocation attempt. If we have a high order allocation then do not fall
> +	 * back to a remote node just based on the watermark check on the requested node
> +	 * because compaction might easily free up a requested order and then it would be
> +	 * better to simply go to the slow path.
> +	 * TODO: kcompactd should help here but nobody has woken it up unless we hit the
> +	 * slow path so we might need some tuning there as well.
> +	 */
> +	if (order && (gfp_mask & __GFP_DIRECT_RECLAIM))
> +		fastpath_mask |= __GFP_THISNODE;
> +	page = get_page_from_freelist(fastpath_mask, order, alloc_flags, &ac);
>  	if (likely(page))
>  		goto out;

  reply	other threads:[~2019-09-26 19:03 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 19:54 [patch for-5.3 0/4] revert immediate fallback to remote hugepages David Rientjes
2019-09-04 19:54 ` [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed David Rientjes
2019-09-05  9:00   ` Michal Hocko
2019-09-05 11:22     ` Vlastimil Babka
2019-09-05 20:53       ` Mike Kravetz
2019-09-06 20:16         ` David Rientjes
2019-09-06 20:49       ` David Rientjes
2019-09-04 20:43 ` [patch for-5.3 0/4] revert immediate fallback to remote hugepages Linus Torvalds
2019-09-05 20:54   ` David Rientjes
2019-09-07 19:51     ` David Rientjes
2019-09-07 19:55       ` Linus Torvalds
2019-09-08  1:50         ` David Rientjes
2019-09-08 12:47           ` Vlastimil Babka
2019-09-08 20:45             ` David Rientjes
2019-09-09  8:37               ` Michal Hocko
2019-09-04 20:55 ` Andrea Arcangeli
2019-09-05 21:06   ` David Rientjes
2019-09-09 19:30     ` Michal Hocko
2019-09-25  7:08       ` Michal Hocko
2019-09-26 19:03         ` David Rientjes [this message]
2019-09-27  7:48           ` Michal Hocko
2019-09-28 20:59             ` Linus Torvalds
2019-09-30 11:28               ` Michal Hocko
2019-10-01  5:43                 ` Michal Hocko
2019-10-01  8:37                   ` Michal Hocko
2019-10-18 14:15                     ` Michal Hocko
2019-10-23 11:03                       ` Vlastimil Babka
2019-10-24 18:59                         ` David Rientjes
2019-10-29 14:14                           ` Vlastimil Babka
2019-10-29 15:15                             ` Michal Hocko
2019-10-29 21:33                               ` Andrew Morton
2019-10-29 21:45                                 ` Vlastimil Babka
2019-10-29 23:25                                 ` David Rientjes
2019-11-05 13:02                                   ` Michal Hocko
2019-11-06  1:01                                     ` David Rientjes
2019-11-06  7:35                                       ` Michal Hocko
2019-11-06 21:32                                         ` David Rientjes
2019-11-13 11:20                                           ` Mel Gorman
2019-11-25  0:10                                             ` David Rientjes
2019-11-25 11:47                                               ` Michal Hocko
2019-11-25 20:38                                                 ` David Rientjes
2019-11-25 21:34                                                   ` Vlastimil Babka
2019-10-01 13:50                   ` Vlastimil Babka
2019-10-01 20:31                     ` David Rientjes
2019-10-01 21:54                       ` Vlastimil Babka
2019-10-02 10:34                         ` Michal Hocko
2019-10-02 22:32                           ` David Rientjes
2019-10-03  8:00                             ` Vlastimil Babka
2019-10-04 12:18                               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1909261149380.39830@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).