From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752679AbdCHLXw (ORCPT ); Wed, 8 Mar 2017 06:23:52 -0500 Received: from www262.sakura.ne.jp ([202.181.97.72]:19149 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751527AbdCHLXu (ORCPT ); Wed, 8 Mar 2017 06:23:50 -0500 Subject: Re: [RFC PATCH 3/4] xfs: map KM_MAYFAIL to __GFP_RETRY_MAYFAIL To: Michal Hocko , linux-mm@kvack.org References: <20170307154843.32516-1-mhocko@kernel.org> <20170307154843.32516-4-mhocko@kernel.org> Cc: Vlastimil Babka , Johannes Weiner , Mel Gorman , Andrew Morton , LKML , Michal Hocko , "Darrick J. Wong" From: Tetsuo Handa Message-ID: Date: Wed, 8 Mar 2017 20:23:37 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <20170307154843.32516-4-mhocko@kernel.org> Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2017/03/08 0:48, Michal Hocko wrote: > From: Michal Hocko > > KM_MAYFAIL didn't have any suitable GFP_FOO counterpart until recently > so it relied on the default page allocator behavior for the given set > of flags. This means that small allocations actually never failed. > > Now that we have __GFP_RETRY_MAYFAIL flag which works independently on the > allocation request size we can map KM_MAYFAIL to it. The allocator will > try as hard as it can to fulfill the request but fails eventually if > the progress cannot be made. > > Cc: Darrick J. Wong > Signed-off-by: Michal Hocko > --- > fs/xfs/kmem.h | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > index ae08cfd9552a..ac80a4855c83 100644 > --- a/fs/xfs/kmem.h > +++ b/fs/xfs/kmem.h > @@ -54,6 +54,16 @@ kmem_flags_convert(xfs_km_flags_t flags) > lflags &= ~__GFP_FS; > } > > + /* > + * Default page/slab allocator behavior is to retry for ever > + * for small allocations. We can override this behavior by using > + * __GFP_RETRY_MAYFAIL which will tell the allocator to retry as long > + * as it is feasible but rather fail than retry for ever for all > + * request sizes. > + */ > + if (flags & KM_MAYFAIL) > + lflags |= __GFP_RETRY_MAYFAIL; I don't see advantages of supporting both __GFP_NORETRY and __GFP_RETRY_MAYFAIL. kmem_flags_convert() can always set __GFP_NORETRY because the callers use opencoded __GFP_NOFAIL loop (with possible allocation lockup warning) unless KM_MAYFAIL is set. > + > if (flags & KM_ZERO) > lflags |= __GFP_ZERO; > > Well, commit 9a67f6488eca926f ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath") unexpectedly changed to always give up without using memory reserves (unless __GFP_NOFAIL is set) if TIF_MEMDIE is set to current thread when current thread is inside __alloc_pages_may_oom() (precisely speaking, if TIF_MEMDIE is set when current thread is after if (gfp_pfmemalloc_allowed(gfp_mask)) alloc_flags = ALLOC_NO_WATERMARKS; line and before /* Avoid allocations with no watermarks from looping endlessly */ if (test_thread_flag(TIF_MEMDIE)) goto nopage; line, which is likely always true); but this is off-topic for this thread. The lines which are executed only when __GFP_RETRY_MAYFAIL is set rather than __GFP_NORETRY is set are /* Do not loop if specifically requested */ if (gfp_mask & __GFP_NORETRY) goto nopage; /* * Do not retry costly high order allocations unless they are * __GFP_RETRY_MAYFAIL */ if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_RETRY_MAYFAIL)) goto nopage; if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, did_some_progress > 0, &no_progress_loops)) goto retry; /* * It doesn't make any sense to retry for the compaction if the order-0 * reclaim is not able to make any progress because the current * implementation of the compaction depends on the sufficient amount * of free memory (see __compaction_suitable) */ if (did_some_progress > 0 && should_compact_retry(ac, order, alloc_flags, compact_result, &compact_priority, &compaction_retries)) goto retry; /* * It's possible we raced with cpuset update so the OOM would be * premature (see below the nopage: label for full explanation). */ if (read_mems_allowed_retry(cpuset_mems_cookie)) goto retry_cpuset; /* Reclaim has failed us, start killing things */ page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress); if (page) goto got_pg; __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, const struct alloc_context *ac, unsigned long *did_some_progress) { struct oom_control oc = { .zonelist = ac->zonelist, .nodemask = ac->nodemask, .memcg = NULL, .gfp_mask = gfp_mask, .order = order, }; struct page *page; *did_some_progress = 0; /* * Acquire the oom lock. If that fails, somebody else is * making progress for us. */ if (!mutex_trylock(&oom_lock)) { *did_some_progress = 1; schedule_timeout_uninterruptible(1); return NULL; } /* * Go through the zonelist yet one more time, keep very high watermark * here, this is only to catch a parallel oom killing, we must fail if * we're still under heavy pressure. */ page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order, ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac); if (page) goto out; /* Coredumps can quickly deplete all memory reserves */ if (current->flags & PF_DUMPCORE) goto out; /* The OOM killer will not help higher order allocs */ if (order > PAGE_ALLOC_COSTLY_ORDER) goto out; /* * We have already exhausted all our reclaim opportunities without any * success so it is time to admit defeat. We will skip the OOM killer * because it is very likely that the caller has a more reasonable * fallback than shooting a random task. */ if (gfp_mask & __GFP_RETRY_MAYFAIL) goto out; where both __GFP_NORETRY and __GFP_RETRY_MAYFAIL are checked after direct reclaim and compaction failed. __GFP_RETRY_MAYFAIL optimistically retries based on one of should_reclaim_retry() or should_compact_retry() or read_mems_allowed_retry() returns true or mutex_trylock(&oom_lock) in __alloc_pages_may_oom() returns 0. If !__GFP_FS allocation requests are holding oom_lock each other, __GFP_RETRY_MAYFAIL allocation requests (which are likely !__GFP_FS allocation requests due to __GFP_FS allocation requests being blocked on direct reclaim) can be blocked for uncontrollable duration without making progress. It seems to me that the difference between __GFP_NORETRY and __GFP_RETRY_MAYFAIL is not useful. Rather, the caller can set __GFP_NORETRY and retry with any control (e.g. set __GFP_HIGH upon first timeout, give up upon second timeout). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f71.google.com (mail-oi0-f71.google.com [209.85.218.71]) by kanga.kvack.org (Postfix) with ESMTP id 17CAF831D3 for ; Wed, 8 Mar 2017 06:23:58 -0500 (EST) Received: by mail-oi0-f71.google.com with SMTP id d2so39435178oif.4 for ; Wed, 08 Mar 2017 03:23:58 -0800 (PST) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id r74si1402502ota.326.2017.03.08.03.23.56 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 08 Mar 2017 03:23:56 -0800 (PST) Subject: Re: [RFC PATCH 3/4] xfs: map KM_MAYFAIL to __GFP_RETRY_MAYFAIL References: <20170307154843.32516-1-mhocko@kernel.org> <20170307154843.32516-4-mhocko@kernel.org> From: Tetsuo Handa Message-ID: Date: Wed, 8 Mar 2017 20:23:37 +0900 MIME-Version: 1.0 In-Reply-To: <20170307154843.32516-4-mhocko@kernel.org> Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko , linux-mm@kvack.org Cc: Vlastimil Babka , Johannes Weiner , Mel Gorman , Andrew Morton , LKML , Michal Hocko , "Darrick J. Wong" On 2017/03/08 0:48, Michal Hocko wrote: > From: Michal Hocko > > KM_MAYFAIL didn't have any suitable GFP_FOO counterpart until recently > so it relied on the default page allocator behavior for the given set > of flags. This means that small allocations actually never failed. > > Now that we have __GFP_RETRY_MAYFAIL flag which works independently on the > allocation request size we can map KM_MAYFAIL to it. The allocator will > try as hard as it can to fulfill the request but fails eventually if > the progress cannot be made. > > Cc: Darrick J. Wong > Signed-off-by: Michal Hocko > --- > fs/xfs/kmem.h | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > index ae08cfd9552a..ac80a4855c83 100644 > --- a/fs/xfs/kmem.h > +++ b/fs/xfs/kmem.h > @@ -54,6 +54,16 @@ kmem_flags_convert(xfs_km_flags_t flags) > lflags &= ~__GFP_FS; > } > > + /* > + * Default page/slab allocator behavior is to retry for ever > + * for small allocations. We can override this behavior by using > + * __GFP_RETRY_MAYFAIL which will tell the allocator to retry as long > + * as it is feasible but rather fail than retry for ever for all > + * request sizes. > + */ > + if (flags & KM_MAYFAIL) > + lflags |= __GFP_RETRY_MAYFAIL; I don't see advantages of supporting both __GFP_NORETRY and __GFP_RETRY_MAYFAIL. kmem_flags_convert() can always set __GFP_NORETRY because the callers use opencoded __GFP_NOFAIL loop (with possible allocation lockup warning) unless KM_MAYFAIL is set. > + > if (flags & KM_ZERO) > lflags |= __GFP_ZERO; > > Well, commit 9a67f6488eca926f ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath") unexpectedly changed to always give up without using memory reserves (unless __GFP_NOFAIL is set) if TIF_MEMDIE is set to current thread when current thread is inside __alloc_pages_may_oom() (precisely speaking, if TIF_MEMDIE is set when current thread is after if (gfp_pfmemalloc_allowed(gfp_mask)) alloc_flags = ALLOC_NO_WATERMARKS; line and before /* Avoid allocations with no watermarks from looping endlessly */ if (test_thread_flag(TIF_MEMDIE)) goto nopage; line, which is likely always true); but this is off-topic for this thread. The lines which are executed only when __GFP_RETRY_MAYFAIL is set rather than __GFP_NORETRY is set are /* Do not loop if specifically requested */ if (gfp_mask & __GFP_NORETRY) goto nopage; /* * Do not retry costly high order allocations unless they are * __GFP_RETRY_MAYFAIL */ if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_RETRY_MAYFAIL)) goto nopage; if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, did_some_progress > 0, &no_progress_loops)) goto retry; /* * It doesn't make any sense to retry for the compaction if the order-0 * reclaim is not able to make any progress because the current * implementation of the compaction depends on the sufficient amount * of free memory (see __compaction_suitable) */ if (did_some_progress > 0 && should_compact_retry(ac, order, alloc_flags, compact_result, &compact_priority, &compaction_retries)) goto retry; /* * It's possible we raced with cpuset update so the OOM would be * premature (see below the nopage: label for full explanation). */ if (read_mems_allowed_retry(cpuset_mems_cookie)) goto retry_cpuset; /* Reclaim has failed us, start killing things */ page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress); if (page) goto got_pg; __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, const struct alloc_context *ac, unsigned long *did_some_progress) { struct oom_control oc = { .zonelist = ac->zonelist, .nodemask = ac->nodemask, .memcg = NULL, .gfp_mask = gfp_mask, .order = order, }; struct page *page; *did_some_progress = 0; /* * Acquire the oom lock. If that fails, somebody else is * making progress for us. */ if (!mutex_trylock(&oom_lock)) { *did_some_progress = 1; schedule_timeout_uninterruptible(1); return NULL; } /* * Go through the zonelist yet one more time, keep very high watermark * here, this is only to catch a parallel oom killing, we must fail if * we're still under heavy pressure. */ page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order, ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac); if (page) goto out; /* Coredumps can quickly deplete all memory reserves */ if (current->flags & PF_DUMPCORE) goto out; /* The OOM killer will not help higher order allocs */ if (order > PAGE_ALLOC_COSTLY_ORDER) goto out; /* * We have already exhausted all our reclaim opportunities without any * success so it is time to admit defeat. We will skip the OOM killer * because it is very likely that the caller has a more reasonable * fallback than shooting a random task. */ if (gfp_mask & __GFP_RETRY_MAYFAIL) goto out; where both __GFP_NORETRY and __GFP_RETRY_MAYFAIL are checked after direct reclaim and compaction failed. __GFP_RETRY_MAYFAIL optimistically retries based on one of should_reclaim_retry() or should_compact_retry() or read_mems_allowed_retry() returns true or mutex_trylock(&oom_lock) in __alloc_pages_may_oom() returns 0. If !__GFP_FS allocation requests are holding oom_lock each other, __GFP_RETRY_MAYFAIL allocation requests (which are likely !__GFP_FS allocation requests due to __GFP_FS allocation requests being blocked on direct reclaim) can be blocked for uncontrollable duration without making progress. It seems to me that the difference between __GFP_NORETRY and __GFP_RETRY_MAYFAIL is not useful. Rather, the caller can set __GFP_NORETRY and retry with any control (e.g. set __GFP_HIGH upon first timeout, give up upon second timeout). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org