All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: Michal Hocko <mhocko@kernel.org>, linux-mm@kvack.org
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: [RFC PATCH 3/4] xfs: map KM_MAYFAIL to __GFP_RETRY_MAYFAIL
Date: Wed, 8 Mar 2017 20:23:37 +0900	[thread overview]
Message-ID: <e7f932bf-313a-917d-6304-81528aca5994@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20170307154843.32516-4-mhocko@kernel.org>

On 2017/03/08 0:48, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> KM_MAYFAIL didn't have any suitable GFP_FOO counterpart until recently
> so it relied on the default page allocator behavior for the given set
> of flags. This means that small allocations actually never failed.
> 
> Now that we have __GFP_RETRY_MAYFAIL flag which works independently on the
> allocation request size we can map KM_MAYFAIL to it. The allocator will
> try as hard as it can to fulfill the request but fails eventually if
> the progress cannot be made.
> 
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> index ae08cfd9552a..ac80a4855c83 100644
> --- a/fs/xfs/kmem.h
> +++ b/fs/xfs/kmem.h
> @@ -54,6 +54,16 @@ kmem_flags_convert(xfs_km_flags_t flags)
>  			lflags &= ~__GFP_FS;
>  	}
>  
> +	/*
> +	 * Default page/slab allocator behavior is to retry for ever
> +	 * for small allocations. We can override this behavior by using
> +	 * __GFP_RETRY_MAYFAIL which will tell the allocator to retry as long
> +	 * as it is feasible but rather fail than retry for ever for all
> +	 * request sizes.
> +	 */
> +	if (flags & KM_MAYFAIL)
> +		lflags |= __GFP_RETRY_MAYFAIL;

I don't see advantages of supporting both __GFP_NORETRY and __GFP_RETRY_MAYFAIL.
kmem_flags_convert() can always set __GFP_NORETRY because the callers use
opencoded __GFP_NOFAIL loop (with possible allocation lockup warning) unless
KM_MAYFAIL is set.

> +
>  	if (flags & KM_ZERO)
>  		lflags |= __GFP_ZERO;
>  
> 

Well, commit 9a67f6488eca926f ("mm: consolidate GFP_NOFAIL checks in the
allocator slowpath") unexpectedly changed to always give up without using
memory reserves (unless __GFP_NOFAIL is set) if TIF_MEMDIE is set to current
thread when current thread is inside __alloc_pages_may_oom() (precisely speaking,
if TIF_MEMDIE is set when current thread is after

        if (gfp_pfmemalloc_allowed(gfp_mask))
                alloc_flags = ALLOC_NO_WATERMARKS;

line and before

        /* Avoid allocations with no watermarks from looping endlessly */
        if (test_thread_flag(TIF_MEMDIE))
                goto nopage;

line, which is likely always true); but this is off-topic for this thread.

The lines which are executed only when __GFP_RETRY_MAYFAIL is set rather than
__GFP_NORETRY is set are

        /* Do not loop if specifically requested */
        if (gfp_mask & __GFP_NORETRY)
                goto nopage;

        /*
         * Do not retry costly high order allocations unless they are
         * __GFP_RETRY_MAYFAIL
         */
        if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_RETRY_MAYFAIL))
                goto nopage;

        if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
                                 did_some_progress > 0, &no_progress_loops))
                goto retry;

        /*
         * It doesn't make any sense to retry for the compaction if the order-0
         * reclaim is not able to make any progress because the current
         * implementation of the compaction depends on the sufficient amount
         * of free memory (see __compaction_suitable)
         */
        if (did_some_progress > 0 &&
                        should_compact_retry(ac, order, alloc_flags,
                                compact_result, &compact_priority,
                                &compaction_retries))
                goto retry;

        /*
         * It's possible we raced with cpuset update so the OOM would be
         * premature (see below the nopage: label for full explanation).
         */
        if (read_mems_allowed_retry(cpuset_mems_cookie))
                goto retry_cpuset;

        /* Reclaim has failed us, start killing things */
        page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
        if (page)
                goto got_pg;

__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
        const struct alloc_context *ac, unsigned long *did_some_progress)
{
        struct oom_control oc = {
                .zonelist = ac->zonelist,
                .nodemask = ac->nodemask,
                .memcg = NULL,
                .gfp_mask = gfp_mask,
                .order = order,
        };
        struct page *page;

        *did_some_progress = 0;

        /*
         * Acquire the oom lock.  If that fails, somebody else is
         * making progress for us.
         */
        if (!mutex_trylock(&oom_lock)) {
                *did_some_progress = 1;
                schedule_timeout_uninterruptible(1);
                return NULL;
        }

        /*
         * Go through the zonelist yet one more time, keep very high watermark
         * here, this is only to catch a parallel oom killing, we must fail if
         * we're still under heavy pressure.
         */
        page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
                                        ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
        if (page)
                goto out;

        /* Coredumps can quickly deplete all memory reserves */
        if (current->flags & PF_DUMPCORE)
                goto out;
        /* The OOM killer will not help higher order allocs */
        if (order > PAGE_ALLOC_COSTLY_ORDER)
                goto out;
        /*
         * We have already exhausted all our reclaim opportunities without any
         * success so it is time to admit defeat. We will skip the OOM killer
         * because it is very likely that the caller has a more reasonable
         * fallback than shooting a random task.
         */
        if (gfp_mask & __GFP_RETRY_MAYFAIL)
                goto out;

where both __GFP_NORETRY and __GFP_RETRY_MAYFAIL are checked after
direct reclaim and compaction failed. __GFP_RETRY_MAYFAIL optimistically
retries based on one of should_reclaim_retry() or should_compact_retry()
or read_mems_allowed_retry() returns true or mutex_trylock(&oom_lock) in
__alloc_pages_may_oom() returns 0. If !__GFP_FS allocation requests are
holding oom_lock each other, __GFP_RETRY_MAYFAIL allocation requests (which
are likely !__GFP_FS allocation requests due to __GFP_FS allocation requests
being blocked on direct reclaim) can be blocked for uncontrollable duration
without making progress. It seems to me that the difference between
__GFP_NORETRY and __GFP_RETRY_MAYFAIL is not useful. Rather, the caller can
set __GFP_NORETRY and retry with any control (e.g. set __GFP_HIGH upon first
timeout, give up upon second timeout).

WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: Michal Hocko <mhocko@kernel.org>, linux-mm@kvack.org
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: [RFC PATCH 3/4] xfs: map KM_MAYFAIL to __GFP_RETRY_MAYFAIL
Date: Wed, 8 Mar 2017 20:23:37 +0900	[thread overview]
Message-ID: <e7f932bf-313a-917d-6304-81528aca5994@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20170307154843.32516-4-mhocko@kernel.org>

On 2017/03/08 0:48, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> KM_MAYFAIL didn't have any suitable GFP_FOO counterpart until recently
> so it relied on the default page allocator behavior for the given set
> of flags. This means that small allocations actually never failed.
> 
> Now that we have __GFP_RETRY_MAYFAIL flag which works independently on the
> allocation request size we can map KM_MAYFAIL to it. The allocator will
> try as hard as it can to fulfill the request but fails eventually if
> the progress cannot be made.
> 
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> index ae08cfd9552a..ac80a4855c83 100644
> --- a/fs/xfs/kmem.h
> +++ b/fs/xfs/kmem.h
> @@ -54,6 +54,16 @@ kmem_flags_convert(xfs_km_flags_t flags)
>  			lflags &= ~__GFP_FS;
>  	}
>  
> +	/*
> +	 * Default page/slab allocator behavior is to retry for ever
> +	 * for small allocations. We can override this behavior by using
> +	 * __GFP_RETRY_MAYFAIL which will tell the allocator to retry as long
> +	 * as it is feasible but rather fail than retry for ever for all
> +	 * request sizes.
> +	 */
> +	if (flags & KM_MAYFAIL)
> +		lflags |= __GFP_RETRY_MAYFAIL;

I don't see advantages of supporting both __GFP_NORETRY and __GFP_RETRY_MAYFAIL.
kmem_flags_convert() can always set __GFP_NORETRY because the callers use
opencoded __GFP_NOFAIL loop (with possible allocation lockup warning) unless
KM_MAYFAIL is set.

> +
>  	if (flags & KM_ZERO)
>  		lflags |= __GFP_ZERO;
>  
> 

Well, commit 9a67f6488eca926f ("mm: consolidate GFP_NOFAIL checks in the
allocator slowpath") unexpectedly changed to always give up without using
memory reserves (unless __GFP_NOFAIL is set) if TIF_MEMDIE is set to current
thread when current thread is inside __alloc_pages_may_oom() (precisely speaking,
if TIF_MEMDIE is set when current thread is after

        if (gfp_pfmemalloc_allowed(gfp_mask))
                alloc_flags = ALLOC_NO_WATERMARKS;

line and before

        /* Avoid allocations with no watermarks from looping endlessly */
        if (test_thread_flag(TIF_MEMDIE))
                goto nopage;

line, which is likely always true); but this is off-topic for this thread.

The lines which are executed only when __GFP_RETRY_MAYFAIL is set rather than
__GFP_NORETRY is set are

        /* Do not loop if specifically requested */
        if (gfp_mask & __GFP_NORETRY)
                goto nopage;

        /*
         * Do not retry costly high order allocations unless they are
         * __GFP_RETRY_MAYFAIL
         */
        if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_RETRY_MAYFAIL))
                goto nopage;

        if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
                                 did_some_progress > 0, &no_progress_loops))
                goto retry;

        /*
         * It doesn't make any sense to retry for the compaction if the order-0
         * reclaim is not able to make any progress because the current
         * implementation of the compaction depends on the sufficient amount
         * of free memory (see __compaction_suitable)
         */
        if (did_some_progress > 0 &&
                        should_compact_retry(ac, order, alloc_flags,
                                compact_result, &compact_priority,
                                &compaction_retries))
                goto retry;

        /*
         * It's possible we raced with cpuset update so the OOM would be
         * premature (see below the nopage: label for full explanation).
         */
        if (read_mems_allowed_retry(cpuset_mems_cookie))
                goto retry_cpuset;

        /* Reclaim has failed us, start killing things */
        page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
        if (page)
                goto got_pg;

__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
        const struct alloc_context *ac, unsigned long *did_some_progress)
{
        struct oom_control oc = {
                .zonelist = ac->zonelist,
                .nodemask = ac->nodemask,
                .memcg = NULL,
                .gfp_mask = gfp_mask,
                .order = order,
        };
        struct page *page;

        *did_some_progress = 0;

        /*
         * Acquire the oom lock.  If that fails, somebody else is
         * making progress for us.
         */
        if (!mutex_trylock(&oom_lock)) {
                *did_some_progress = 1;
                schedule_timeout_uninterruptible(1);
                return NULL;
        }

        /*
         * Go through the zonelist yet one more time, keep very high watermark
         * here, this is only to catch a parallel oom killing, we must fail if
         * we're still under heavy pressure.
         */
        page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
                                        ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
        if (page)
                goto out;

        /* Coredumps can quickly deplete all memory reserves */
        if (current->flags & PF_DUMPCORE)
                goto out;
        /* The OOM killer will not help higher order allocs */
        if (order > PAGE_ALLOC_COSTLY_ORDER)
                goto out;
        /*
         * We have already exhausted all our reclaim opportunities without any
         * success so it is time to admit defeat. We will skip the OOM killer
         * because it is very likely that the caller has a more reasonable
         * fallback than shooting a random task.
         */
        if (gfp_mask & __GFP_RETRY_MAYFAIL)
                goto out;

where both __GFP_NORETRY and __GFP_RETRY_MAYFAIL are checked after
direct reclaim and compaction failed. __GFP_RETRY_MAYFAIL optimistically
retries based on one of should_reclaim_retry() or should_compact_retry()
or read_mems_allowed_retry() returns true or mutex_trylock(&oom_lock) in
__alloc_pages_may_oom() returns 0. If !__GFP_FS allocation requests are
holding oom_lock each other, __GFP_RETRY_MAYFAIL allocation requests (which
are likely !__GFP_FS allocation requests due to __GFP_FS allocation requests
being blocked on direct reclaim) can be blocked for uncontrollable duration
without making progress. It seems to me that the difference between
__GFP_NORETRY and __GFP_RETRY_MAYFAIL is not useful. Rather, the caller can
set __GFP_NORETRY and retry with any control (e.g. set __GFP_HIGH upon first
timeout, give up upon second timeout).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-03-08 11:23 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-07 15:48 [RFC PATCH 0/4 v2] mm: give __GFP_REPEAT a better semantic Michal Hocko
2017-03-07 15:48 ` Michal Hocko
2017-03-07 15:48 ` [PATCH 1/4] s390: get rid of superfluous __GFP_REPEAT Michal Hocko
2017-03-07 15:48   ` Michal Hocko
2017-03-08  8:23   ` Heiko Carstens
2017-03-08  8:23     ` Heiko Carstens
2017-03-08 14:11     ` Michal Hocko
2017-03-08 14:11       ` Michal Hocko
2017-03-09  8:27       ` Heiko Carstens
2017-03-09  8:27         ` Heiko Carstens
2017-03-07 15:48 ` [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic Michal Hocko
2017-03-07 15:48   ` Michal Hocko
2017-05-25  1:21   ` NeilBrown
2017-05-31 11:42     ` Michal Hocko
2017-05-31 11:42       ` Michal Hocko
2017-06-03  2:24   ` Wei Yang
2017-06-05  6:43     ` Michal Hocko
2017-06-05  6:43       ` Michal Hocko
2017-06-06  3:04       ` Wei Yang
2017-06-06 12:03         ` Michal Hocko
2017-06-06 12:03           ` Michal Hocko
2017-06-07  2:10           ` Wei Yang
2017-06-09  7:32             ` Michal Hocko
2017-06-09  7:32               ` Michal Hocko
2017-03-07 15:48 ` [RFC PATCH 3/4] xfs: map KM_MAYFAIL to __GFP_RETRY_MAYFAIL Michal Hocko
2017-03-07 15:48   ` Michal Hocko
2017-03-07 17:05   ` Darrick J. Wong
2017-03-07 17:05     ` Darrick J. Wong
2017-03-08  9:35     ` Michal Hocko
2017-03-08  9:35       ` Michal Hocko
2017-03-08 11:23   ` Tetsuo Handa [this message]
2017-03-08 11:23     ` Tetsuo Handa
2017-03-08 12:54     ` Michal Hocko
2017-03-08 12:54       ` Michal Hocko
2017-03-08 15:06   ` Christoph Hellwig
2017-03-08 15:06     ` Christoph Hellwig
2017-03-09  9:16     ` Michal Hocko
2017-03-09  9:16       ` Michal Hocko
2017-03-07 15:48 ` [RFC PATCH 4/4] mm: kvmalloc support __GFP_RETRY_MAYFAIL for all sizes Michal Hocko
2017-03-07 15:48   ` Michal Hocko
2017-05-16  9:10 ` [RFC PATCH 0/4 v2] mm: give __GFP_REPEAT a better semantic Michal Hocko
2017-05-16  9:10   ` Michal Hocko
2017-05-23  8:12   ` Vlastimil Babka
2017-05-23  8:12     ` Vlastimil Babka
2017-05-24  1:06     ` NeilBrown
2017-05-24  7:34       ` Michal Hocko
2017-05-24  7:34         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7f932bf-313a-917d-6304-81528aca5994@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=darrick.wong@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.