linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Alex Williamson <alex.williamson@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH 2/2] mm: thp: fix transparent_hugepage/defrag = madvise || always
Date: Wed, 22 Aug 2018 10:24:46 -0400	[thread overview]
Message-ID: <20180822142446.GL13047@redhat.com> (raw)
In-Reply-To: <20180822110737.GK29735@dhcp22.suse.cz>

On Wed, Aug 22, 2018 at 01:07:37PM +0200, Michal Hocko wrote:
> On Wed 22-08-18 11:02:14, Michal Hocko wrote:
> > On Tue 21-08-18 17:40:49, Andrea Arcangeli wrote:
> > > On Tue, Aug 21, 2018 at 01:50:57PM +0200, Michal Hocko wrote:
> > [...]
> > > > I really detest a new gfp flag for one time semantic that is muddy as
> > > > hell.
> > > 
> > > Well there's no way to fix this other than to prevent reclaim to run,
> > > if you still want to give a chance to page faults to obtain THP under
> > > MADV_HUGEPAGE in the page fault without waiting minutes or hours for
> > > khugpaged to catch up with it.
> > 
> > I do not get that part. Why should caller even care about reclaim vs.
> > compaction. How can you even make an educated guess what makes more
> > sense? This should be fully controlled by the allocator path. The caller
> > should only care about how hard to try. It's been some time since I've
> > looked but we used to have a gfp flags to tell that for THP allocations
> > as well.
> 
> In other words, why do we even try to swap out when allocating costly
> high order page for requests which do not insist to try really hard?

Note that the testcase with vfio swaps nothing and writes nothing to
disk. No memory at all is being swapped or freed because 100% of the
node is pinned with GUP pins, so I'm dubious this could possible move
the needle for the reproducer that I used for the benchmark.

The swap storm I suggested to you as reproducer, because it's another
way the bug would see the light of the day and it's easier to
reproduce without requiring device assignment, but the badness is the
fact reclaim is called when it shouldn't be and whatever fix must
cover vfio too. The below I can't imagine how it could possibly have
an effect on vfio, and even for the swap storm case you're converting
a swap storm into a CPU waste, it'll still run just extremely slow
allocations like with vfio.

The effect of the below should be evaluated regardless of the issue
we've been discussing in this thread and it's a new corner case for
order > PAGE_ALLOC_COSTLY_ORDER. I don't like very much order >
PAGE_ALLOC_COSTLY_ORDER checks, those are arbitrary numbers, the more
checks are needed in various places for that, the more it's a sign the
VM is bad and arbitrary and with one more corner case required to hide
some badness. But again this will have effects unrelated to what we're
discussing here and it will just convert I/O into CPU waste and have
no effect on vfio.

> 
> I mean why don't we do something like this?
> ---
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 03822f86f288..41005d3d4c2d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3071,6 +3071,14 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  	if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
>  		return 1;
>  
> +	/*
> +	 * If we are allocating a costly order and do not insist on trying really
> +	 * hard then we should keep the reclaim impact at minimum. So only
> +	 * focus on easily reclaimable memory.
> +	 */
> +	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_RETRY_MAYFAIL))
> +		sc.may_swap = sc.may_unmap = 0;
> +
>  	trace_mm_vmscan_direct_reclaim_begin(order,
>  				sc.may_writepage,
>  				sc.gfp_mask,
> -- 
> Michal Hocko
> SUSE Labs
> 

  reply	other threads:[~2018-08-22 14:24 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-20  3:22 [PATCH 0/2] fix for "pathological THP behavior" Andrea Arcangeli
2018-08-20  3:22 ` [PATCH 1/2] mm: thp: consolidate policy_nodemask call Andrea Arcangeli
2018-08-20  3:22 ` [PATCH 2/2] mm: thp: fix transparent_hugepage/defrag = madvise || always Andrea Arcangeli
2018-08-20  3:26   ` [PATCH 0/1] fix for "pathological THP behavior" v2 Andrea Arcangeli
2018-08-20  3:26     ` [PATCH 1/1] mm: thp: fix transparent_hugepage/defrag = madvise || always Andrea Arcangeli
2018-08-20 12:35   ` [PATCH 2/2] " Zi Yan
2018-08-20 15:32     ` Andrea Arcangeli
2018-08-21 11:50   ` Michal Hocko
2018-08-21 21:40     ` Andrea Arcangeli
2018-08-22  9:02       ` Michal Hocko
2018-08-22 11:07         ` Michal Hocko
2018-08-22 14:24           ` Andrea Arcangeli [this message]
2018-08-22 14:45             ` Michal Hocko
2018-08-22 15:24               ` Andrea Arcangeli
2018-08-23 10:50                 ` Michal Hocko
2018-08-22 15:52         ` Andrea Arcangeli
2018-08-23 10:52           ` Michal Hocko
2018-08-28  7:53             ` Michal Hocko
2018-08-28  8:18               ` Michal Hocko
2018-08-28  8:54                 ` Stefan Priebe - Profihost AG
2018-08-29 11:11                   ` Stefan Priebe - Profihost AG
     [not found]                 ` <D5F4A33C-0A37-495C-9468-D6866A862097@cs.rutgers.edu>
2018-08-29 14:28                   ` Michal Hocko
2018-08-29 14:35                     ` Michal Hocko
2018-08-29 15:22                       ` Zi Yan
2018-08-29 15:47                         ` Michal Hocko
2018-08-29 16:06                           ` Zi Yan
2018-08-29 16:25                             ` Michal Hocko
2018-08-29 19:24                               ` [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Michal Hocko
2018-08-29 22:54                                 ` Zi Yan
2018-08-30  7:00                                   ` Michal Hocko
2018-08-30 13:22                                     ` Zi Yan
2018-08-30 13:45                                       ` Michal Hocko
2018-08-30 14:02                                         ` Zi Yan
2018-08-30 16:19                                           ` Stefan Priebe - Profihost AG
2018-08-30 16:40                                           ` Michal Hocko
2018-09-05  3:44                                             ` Andrea Arcangeli
2018-09-05  7:08                                               ` Michal Hocko
2018-09-06 11:10                                                 ` Vlastimil Babka
2018-09-06 11:16                                                   ` Vlastimil Babka
2018-09-06 11:25                                                     ` Michal Hocko
2018-09-06 12:35                                                       ` Zi Yan
2018-09-06 10:59                                   ` Vlastimil Babka
2018-09-06 11:17                                     ` Zi Yan
2018-08-30  6:47                                 ` Michal Hocko
2018-09-06 11:18                                   ` Vlastimil Babka
2018-09-06 11:27                                     ` Michal Hocko
2018-09-12 17:29                                 ` Mel Gorman
2018-09-17  6:11                                   ` Michal Hocko
2018-09-17  7:04                                     ` Stefan Priebe - Profihost AG
2018-09-17  9:32                                       ` Stefan Priebe - Profihost AG
2018-09-17 11:27                                       ` Michal Hocko
2018-08-20 11:58 ` [PATCH 0/2] fix for "pathological THP behavior" Kirill A. Shutemov
2018-08-20 15:19   ` Andrea Arcangeli
2018-08-21 15:30     ` Vlastimil Babka
2018-08-21 17:26       ` David Rientjes
2018-08-21 22:18         ` Andrea Arcangeli
2018-08-21 22:05       ` Andrea Arcangeli
2018-08-22  9:24       ` Michal Hocko
2018-08-22 15:56         ` Andrea Arcangeli
2018-08-20 19:06   ` Yang Shi
2018-08-20 23:24     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180822142446.GL13047@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).