From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752529AbbIRGkE (ORCPT ); Fri, 18 Sep 2015 02:40:04 -0400 Received: from LGEAMRELO13.lge.com ([156.147.23.53]:56731 "EHLO lgeamrelo13.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751433AbbIRGkC (ORCPT ); Fri, 18 Sep 2015 02:40:02 -0400 X-Original-SENDERIP: 156.147.1.126 X-Original-MAILFROM: iamjoonsoo.kim@lge.com X-Original-SENDERIP: 10.177.222.138 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Fri, 18 Sep 2015 15:25:41 +0900 From: Joonsoo Kim To: Mel Gorman Cc: Andrew Morton , Johannes Weiner , Rik van Riel , Vlastimil Babka , David Rientjes , Michal Hocko , Linux-MM , LKML Subject: Re: [PATCH 07/12] mm, page_alloc: Distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd Message-ID: <20150918062541.GA7769@js1304-P5Q-DELUXE> References: <1440418191-10894-1-git-send-email-mgorman@techsingularity.net> <1440418191-10894-8-git-send-email-mgorman@techsingularity.net> <20150909122203.GY12432@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150909122203.GY12432@techsingularity.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 09, 2015 at 01:22:03PM +0100, Mel Gorman wrote: > On Tue, Sep 08, 2015 at 03:49:58PM +0900, Joonsoo Kim wrote: > > 2015-08-24 21:09 GMT+09:00 Mel Gorman : > > > __GFP_WAIT has been used to identify atomic context in callers that hold > > > spinlocks or are in interrupts. They are expected to be high priority and > > > have access one of two watermarks lower than "min" which can be referred > > > to as the "atomic reserve". __GFP_HIGH users get access to the first lower > > > watermark and can be called the "high priority reserve". > > > > > > Over time, callers had a requirement to not block when fallback options > > > were available. Some have abused __GFP_WAIT leading to a situation where > > > an optimisitic allocation with a fallback option can access atomic reserves. > > > > > > This patch uses __GFP_ATOMIC to identify callers that are truely atomic, > > > cannot sleep and have no alternative. High priority users continue to use > > > __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are > > > willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers > > > that want to wake kswapd for background reclaim. __GFP_WAIT is redefined > > > as a caller that is willing to enter direct reclaim and wake kswapd for > > > background reclaim. > > > > Hello, Mel. > > > > I think that it is better to do one thing at one patch. > > This was a case where the incremental change felt unnecessary. The purpose > of the patch is to "distinguish between being unable to sleep, unwilling > to sleep and avoiding waking kswapd". Splitting that up is possible but > I'm not convinced it helps. > > > To distinguish real atomic, we just need to introduce __GFP_ATOMIC and > > make GFP_ATOMIC to __GFP_ATOMIC | GFP_HARDER and change related > > things. __GFP_WAIT changes isn't needed at all for this purpose. It can > > reduce patch size and provides more good bisectability. > > > > And, I don't think that introducing __GFP_KSWAPD_RECLAIM is good thing. > > Basically, kswapd reclaim should be enforced. > > Several years ago, I would have agreed. Now there are callers that want > to control kswapd and I think it made more sense to clearly state whether > RECLAIM and KSWAPD are allowed instead of having RECLAIM and NO_KSWAPD > flags -- i.e. flags that consistently allow or consistently deny. > > > New flag makes user who manually > > manipulate gfp flag more difficult. Without this change, your second hazard will > > be disappeared although it is almost harmless. > > > > And, I doubt that this big one shot change is preferable. AFAIK, even if changes > > are one to one mapping and no functional difference, each one is made by > > one patch and send it to correct maintainer. I guess there is some difficulty > > in this patch to do like this, but, it could. Isn't it? > > > > Splitting this into one patch per maintainer would be a review and bisection > nightmare. If I saw someone else doing that I would wonder if they were > just trying to increase their patch count for no reason. > > > Some nitpicks are below. > > > > > > > > > > > diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c > > > index 03e75fef15b8..86809bd2026d 100644 > > > --- a/arch/arm/xen/mm.c > > > +++ b/arch/arm/xen/mm.c > > > @@ -25,7 +25,7 @@ > > > unsigned long xen_get_swiotlb_free_pages(unsigned int order) > > > { > > > struct memblock_region *reg; > > > - gfp_t flags = __GFP_NOWARN; > > > + gfp_t flags = __GFP_NOWARN|___GFP_KSWAPD_RECLAIM; > > > > Please use __XXX rather than ___XXX. > > > > Fixed. > > > > > > > > > > @@ -457,13 +457,13 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) > > > * We solve this, and guarantee forward progress, with a rescuer > > > * workqueue per bio_set. If we go to allocate and there are > > > * bios on current->bio_list, we first try the allocation > > > - * without __GFP_WAIT; if that fails, we punt those bios we > > > - * would be blocking to the rescuer workqueue before we retry > > > - * with the original gfp_flags. > > > + * without __GFP_DIRECT_RECLAIM; if that fails, we punt those > > > + * bios we would be blocking to the rescuer workqueue before > > > + * we retry with the original gfp_flags. > > > */ > > > > > > if (current->bio_list && !bio_list_empty(current->bio_list)) > > > - gfp_mask &= ~__GFP_WAIT; > > > + gfp_mask &= ~__GFP_DIRECT_RECLAIM; > > > > How about introduce helper function to mask out __GFP_DIRECT_RECLAIM? > > It can be used many places. > > > > In this case, the pattern for removing a single flag is easier to recognise > than a helper whose implementation must be examined. > > > > p = mempool_alloc(bs->bio_pool, gfp_mask); > > > if (!p && gfp_mask != saved_gfp) { > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > index 627ed0c593fb..e3605acaaffc 100644 > > > --- a/block/blk-core.c > > > +++ b/block/blk-core.c > > > @@ -1156,8 +1156,8 @@ static struct request *__get_request(struct request_list *rl, int rw_flags, > > > * @bio: bio to allocate request for (can be %NULL) > > > * @gfp_mask: allocation mask > > > * > > > - * Get a free request from @q. If %__GFP_WAIT is set in @gfp_mask, this > > > - * function keeps retrying under memory pressure and fails iff @q is dead. > > > + * Get a free request from @q. If %__GFP_DIRECT_RECLAIM is set in @gfp_mask, > > > + * this function keeps retrying under memory pressure and fails iff @q is dead. > > > * > > > * Must be called with @q->queue_lock held and, > > > * Returns ERR_PTR on failure, with @q->queue_lock held. > > > @@ -1177,7 +1177,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags, > > > if (!IS_ERR(rq)) > > > return rq; > > > > > > - if (!(gfp_mask & __GFP_WAIT) || unlikely(blk_queue_dying(q))) { > > > + if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) { > > > blk_put_rl(rl); > > > return rq; > > > } > > > @@ -1255,11 +1255,11 @@ EXPORT_SYMBOL(blk_get_request); > > > * BUG. > > > * > > > * WARNING: When allocating/cloning a bio-chain, careful consideration should be > > > - * given to how you allocate bios. In particular, you cannot use __GFP_WAIT for > > > - * anything but the first bio in the chain. Otherwise you risk waiting for IO > > > - * completion of a bio that hasn't been submitted yet, thus resulting in a > > > - * deadlock. Alternatively bios should be allocated using bio_kmalloc() instead > > > - * of bio_alloc(), as that avoids the mempool deadlock. > > > + * given to how you allocate bios. In particular, you cannot use > > > + * __GFP_DIRECT_RECLAIM for anything but the first bio in the chain. Otherwise > > > + * you risk waiting for IO completion of a bio that hasn't been submitted yet, > > > + * thus resulting in a deadlock. Alternatively bios should be allocated using > > > + * bio_kmalloc() instead of bio_alloc(), as that avoids the mempool deadlock. > > > * If possible a big IO should be split into smaller parts when allocation > > > * fails. Partial allocation should not be an error, or you risk a live-lock. > > > */ > > > diff --git a/block/blk-ioc.c b/block/blk-ioc.c > > > index 1a27f45ec776..381cb50a673c 100644 > > > --- a/block/blk-ioc.c > > > +++ b/block/blk-ioc.c > > > @@ -289,7 +289,7 @@ struct io_context *get_task_io_context(struct task_struct *task, > > > { > > > struct io_context *ioc; > > > > > > - might_sleep_if(gfp_flags & __GFP_WAIT); > > > + might_sleep_if(gfpflags_allow_blocking(gfp_flags)); > > > > > > do { > > > task_lock(task); > > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > > > index 9b6e28830b82..a8b46659ce4e 100644 > > > --- a/block/blk-mq-tag.c > > > +++ b/block/blk-mq-tag.c > > > @@ -264,7 +264,7 @@ static int bt_get(struct blk_mq_alloc_data *data, > > > if (tag != -1) > > > return tag; > > > > > > - if (!(data->gfp & __GFP_WAIT)) > > > + if (!gfpflags_allow_blocking(data->gfp)) > > > return -1; > > > > > > bs = bt_wait_ptr(bt, hctx); > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > > index 7d842db59699..7d80379d7a38 100644 > > > --- a/block/blk-mq.c > > > +++ b/block/blk-mq.c > > > @@ -85,7 +85,7 @@ static int blk_mq_queue_enter(struct request_queue *q, gfp_t gfp) > > > if (percpu_ref_tryget_live(&q->mq_usage_counter)) > > > return 0; > > > > > > - if (!(gfp & __GFP_WAIT)) > > > + if (!gfpflags_allow_blocking(gfp)) > > > return -EBUSY; > > > > > > ret = wait_event_interruptible(q->mq_freeze_wq, > > > @@ -261,11 +261,11 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw, gfp_t gfp, > > > > > > ctx = blk_mq_get_ctx(q); > > > hctx = q->mq_ops->map_queue(q, ctx->cpu); > > > - blk_mq_set_alloc_data(&alloc_data, q, gfp & ~__GFP_WAIT, > > > + blk_mq_set_alloc_data(&alloc_data, q, gfp & ~__GFP_DIRECT_RECLAIM, > > > reserved, ctx, hctx); > > > > > > rq = __blk_mq_alloc_request(&alloc_data, rw); > > > - if (!rq && (gfp & __GFP_WAIT)) { > > > + if (!rq && (gfp & __GFP_DIRECT_RECLAIM)) { > > > __blk_mq_run_hw_queue(hctx); > > > blk_mq_put_ctx(ctx); > > > > Is there any reason not to use gfpflags_allow_nonblocking() here? > > There are some places not using this helper and reason isn't > > specified. > > > > Strictly speaking the helper could be used. However, in cases where the > same function manipulates or examines the flag in any way, I did not use > the helper. It's in all those cases, I thought the final result was > easier to follow. > > > > > > /* > > > + * A caller that is willing to wait may enter direct reclaim and will > > > + * wake kswapd to reclaim pages in the background until the high > > > + * watermark is met. A caller may wish to clear __GFP_DIRECT_RECLAIM to > > > + * avoid unnecessary delays when a fallback option is available but > > > + * still allow kswapd to reclaim in the background. The kswapd flag > > > + * can be cleared when the reclaiming of pages would cause unnecessary > > > + * disruption. > > > + */ > > > +#define __GFP_WAIT (__GFP_DIRECT_RECLAIM|__GFP_KSWAPD_RECLAIM) > > > > Convention is that combination of gfp flags don't use __XXX. > > > > I don't understand. GFP_MOVABLE_MASK, GFP_USER and a bunch of other > combinations use __XXX. Hello, Mel. Sorry for late response. Yes, GFP_XXX can consist of multiple __GFP_XXX. But, __GFP_XXX doesn't consist of multiple __GFP_YYY. Your __GFP_WAIT seems to be a first one. Thanks.