From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751375AbcGNFTm (ORCPT ); Thu, 14 Jul 2016 01:19:42 -0400 Received: from LGEAMRELO11.lge.com ([156.147.23.51]:36727 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751208AbcGNFTl (ORCPT ); Thu, 14 Jul 2016 01:19:41 -0400 X-Original-SENDERIP: 156.147.1.121 X-Original-MAILFROM: iamjoonsoo.kim@lge.com X-Original-SENDERIP: 10.177.222.138 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Thu, 14 Jul 2016 14:23:32 +0900 From: Joonsoo Kim To: Mel Gorman Cc: Andrew Morton , Linux-MM , Rik van Riel , Vlastimil Babka , Johannes Weiner , LKML Subject: Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps Message-ID: <20160714052332.GA29676@js1304-P5Q-DELUXE> References: <1467403299-25786-1-git-send-email-mgorman@techsingularity.net> <1467403299-25786-9-git-send-email-mgorman@techsingularity.net> <20160707012038.GB27987@js1304-P5Q-DELUXE> <20160707101701.GR11498@techsingularity.net> <20160708024447.GB2370@js1304-P5Q-DELUXE> <20160708101147.GD11498@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160708101147.GD11498@techsingularity.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 08, 2016 at 11:11:47AM +0100, Mel Gorman wrote: > On Fri, Jul 08, 2016 at 11:44:47AM +0900, Joonsoo Kim wrote: > > > > > @@ -3390,12 +3386,24 @@ static int kswapd(void *p) > > > > > * We can speed up thawing tasks if we don't call balance_pgdat > > > > > * after returning from the refrigerator > > > > > */ > > > > > - if (!ret) { > > > > > - trace_mm_vmscan_kswapd_wake(pgdat->node_id, order); > > > > > + if (ret) > > > > > + continue; > > > > > > > > > > - /* return value ignored until next patch */ > > > > > - balance_pgdat(pgdat, order, classzone_idx); > > > > > - } > > > > > + /* > > > > > + * Reclaim begins at the requested order but if a high-order > > > > > + * reclaim fails then kswapd falls back to reclaiming for > > > > > + * order-0. If that happens, kswapd will consider sleeping > > > > > + * for the order it finished reclaiming at (reclaim_order) > > > > > + * but kcompactd is woken to compact for the original > > > > > + * request (alloc_order). > > > > > + */ > > > > > + trace_mm_vmscan_kswapd_wake(pgdat->node_id, alloc_order); > > > > > + reclaim_order = balance_pgdat(pgdat, alloc_order, classzone_idx); > > > > > + if (reclaim_order < alloc_order) > > > > > + goto kswapd_try_sleep; > > > > > > > > This 'goto' would cause kswapd to sleep prematurely. We need to check > > > > *new* pgdat->kswapd_order and classzone_idx even in this case. > > > > > > > > > > It only matters if the next request coming is also high-order requests but > > > one thing that needs to be avoided is kswapd staying awake periods of time > > > constantly reclaiming for high-order pages. This is why the check means > > > "If we reclaimed for high-order and failed, then consider sleeping now". > > > If allocations still require it, they direct reclaim instead. > > > > But, assume that next request is zone-constrained allocation. We need > > to balance memory for it but kswapd would skip it. > > > > Then it'll also be woken up again in the very near future as the > zone-constrained allocation. If the zone is at the min watermark, then > it'll have direct reclaimed but between min and low, it'll be a simple > wakeup. > > The premature sleep, wakeup with new requests logic was a complete mess. > However, what I did do is remove the -1 handling of kswapd_classzone_idx > handling and the goto full-sleep. In the event of a premature wakeup, > it'll recheck for wakeups and if none has occured, it'll use the old > classzone information. > > Note that it will *not* use the original allocation order if it's a > premature sleep. This is because it's known that high-order reclaim > failed in the near past and restarting it has a high risk of > overreclaiming. > > > > > And, I'd like to know why max() is used for classzone_idx rather than > > > > min()? I think that kswapd should balance the lowest zone requested. > > > > > > > > > > If there are two allocation requests -- one zone-constraned and the other > > > zone-unconstrained, it does not make sense to have kswapd skip the pages > > > usable for the zone-unconstrained and waste a load of CPU. You could > > > > I agree that, in this case, it's not good to skip the pages usable > > for the zone-unconstrained request. But, what I am concerned is that > > kswapd stop reclaim prematurely in the view of zone-constrained > > requestor. > > It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU > for the whole node that may or may not have lower zone pages at the end > of the LRU. If it does, then the allocation request will be satisfied. > If it does not, then kswapd will think the node is balanced and get > rewoken to do a zone-constrained reclaim pass. If zone-constrained request could go direct reclaim pass, there would be no problem. But, please assume that request is zone-constrained without __GFP_DIRECT_RECLAIM which is common for some device driver implementation. And, please assume one more thing that this request always comes with zone-unconstrained allocation request. In this case, your max() logic will set kswapd_classzone_idx to highest zone index and re-worken kswapd would not balance for low zone again. In the end, zone-constrained allocation request without __GFP_DIRECT_RECLAIM could fail. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f200.google.com (mail-ob0-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 148436B0005 for ; Thu, 14 Jul 2016 01:19:42 -0400 (EDT) Received: by mail-ob0-f200.google.com with SMTP id lm2so128651393obb.3 for ; Wed, 13 Jul 2016 22:19:42 -0700 (PDT) Received: from lgeamrelo11.lge.com (LGEAMRELO11.lge.com. [156.147.23.51]) by mx.google.com with ESMTP id g13si772673ote.291.2016.07.13.22.19.40 for ; Wed, 13 Jul 2016 22:19:41 -0700 (PDT) Date: Thu, 14 Jul 2016 14:23:32 +0900 From: Joonsoo Kim Subject: Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps Message-ID: <20160714052332.GA29676@js1304-P5Q-DELUXE> References: <1467403299-25786-1-git-send-email-mgorman@techsingularity.net> <1467403299-25786-9-git-send-email-mgorman@techsingularity.net> <20160707012038.GB27987@js1304-P5Q-DELUXE> <20160707101701.GR11498@techsingularity.net> <20160708024447.GB2370@js1304-P5Q-DELUXE> <20160708101147.GD11498@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160708101147.GD11498@techsingularity.net> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Andrew Morton , Linux-MM , Rik van Riel , Vlastimil Babka , Johannes Weiner , LKML On Fri, Jul 08, 2016 at 11:11:47AM +0100, Mel Gorman wrote: > On Fri, Jul 08, 2016 at 11:44:47AM +0900, Joonsoo Kim wrote: > > > > > @@ -3390,12 +3386,24 @@ static int kswapd(void *p) > > > > > * We can speed up thawing tasks if we don't call balance_pgdat > > > > > * after returning from the refrigerator > > > > > */ > > > > > - if (!ret) { > > > > > - trace_mm_vmscan_kswapd_wake(pgdat->node_id, order); > > > > > + if (ret) > > > > > + continue; > > > > > > > > > > - /* return value ignored until next patch */ > > > > > - balance_pgdat(pgdat, order, classzone_idx); > > > > > - } > > > > > + /* > > > > > + * Reclaim begins at the requested order but if a high-order > > > > > + * reclaim fails then kswapd falls back to reclaiming for > > > > > + * order-0. If that happens, kswapd will consider sleeping > > > > > + * for the order it finished reclaiming at (reclaim_order) > > > > > + * but kcompactd is woken to compact for the original > > > > > + * request (alloc_order). > > > > > + */ > > > > > + trace_mm_vmscan_kswapd_wake(pgdat->node_id, alloc_order); > > > > > + reclaim_order = balance_pgdat(pgdat, alloc_order, classzone_idx); > > > > > + if (reclaim_order < alloc_order) > > > > > + goto kswapd_try_sleep; > > > > > > > > This 'goto' would cause kswapd to sleep prematurely. We need to check > > > > *new* pgdat->kswapd_order and classzone_idx even in this case. > > > > > > > > > > It only matters if the next request coming is also high-order requests but > > > one thing that needs to be avoided is kswapd staying awake periods of time > > > constantly reclaiming for high-order pages. This is why the check means > > > "If we reclaimed for high-order and failed, then consider sleeping now". > > > If allocations still require it, they direct reclaim instead. > > > > But, assume that next request is zone-constrained allocation. We need > > to balance memory for it but kswapd would skip it. > > > > Then it'll also be woken up again in the very near future as the > zone-constrained allocation. If the zone is at the min watermark, then > it'll have direct reclaimed but between min and low, it'll be a simple > wakeup. > > The premature sleep, wakeup with new requests logic was a complete mess. > However, what I did do is remove the -1 handling of kswapd_classzone_idx > handling and the goto full-sleep. In the event of a premature wakeup, > it'll recheck for wakeups and if none has occured, it'll use the old > classzone information. > > Note that it will *not* use the original allocation order if it's a > premature sleep. This is because it's known that high-order reclaim > failed in the near past and restarting it has a high risk of > overreclaiming. > > > > > And, I'd like to know why max() is used for classzone_idx rather than > > > > min()? I think that kswapd should balance the lowest zone requested. > > > > > > > > > > If there are two allocation requests -- one zone-constraned and the other > > > zone-unconstrained, it does not make sense to have kswapd skip the pages > > > usable for the zone-unconstrained and waste a load of CPU. You could > > > > I agree that, in this case, it's not good to skip the pages usable > > for the zone-unconstrained request. But, what I am concerned is that > > kswapd stop reclaim prematurely in the view of zone-constrained > > requestor. > > It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU > for the whole node that may or may not have lower zone pages at the end > of the LRU. If it does, then the allocation request will be satisfied. > If it does not, then kswapd will think the node is balanced and get > rewoken to do a zone-constrained reclaim pass. If zone-constrained request could go direct reclaim pass, there would be no problem. But, please assume that request is zone-constrained without __GFP_DIRECT_RECLAIM which is common for some device driver implementation. And, please assume one more thing that this request always comes with zone-unconstrained allocation request. In this case, your max() logic will set kswapd_classzone_idx to highest zone index and re-worken kswapd would not balance for low zone again. In the end, zone-constrained allocation request without __GFP_DIRECT_RECLAIM could fail. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org