From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5538BC433DF for ; Wed, 17 Jun 2020 20:53:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2DEE621582 for ; Wed, 17 Jun 2020 20:53:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1592427196; bh=fn2yk/yFjARGWSEFsjLDcqp5WNIOynxfyeyLoNcKW+k=; h=Date:From:To:Cc:Subject:In-Reply-To:References:List-ID:From; b=cdPG7hDYrPdAnndWD9BFyteKXkTjwo3YXCBexXUSNct+KptHqtiBRGBIx92T7SqPn tOb46nK9K6NBmUqueZ50QYV0mzoGzKOMZzQgQldBNXvrL9ESemNu4jqcucLCRAtAgv DBxyhl0u/LpElLDesZgk27OyZuoqvvnnYy4fScoI= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726998AbgFQUxO (ORCPT ); Wed, 17 Jun 2020 16:53:14 -0400 Received: from mail.kernel.org ([198.145.29.99]:38902 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726761AbgFQUxN (ORCPT ); Wed, 17 Jun 2020 16:53:13 -0400 Received: from X1 (nat-ab2241.sltdut.senawave.net [162.218.216.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A8E2B2073E; Wed, 17 Jun 2020 20:53:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1592427193; bh=fn2yk/yFjARGWSEFsjLDcqp5WNIOynxfyeyLoNcKW+k=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=HjDClTRvr8xAI8oqRuscDf1288YkY+vBcA9xL8LlWBqGpFC63d80YZKXAwaGmhu1K yGOghB/CWY2PlFivh46jtSNKt6QlJIjws2MSND/pfhsIvwOdqe4tW1IP4F5Hkv0qqi 3kZqLpHGln4q7kjDM/Ap/jAyMTb1W6Y2n8fXtibc= Date: Wed, 17 Jun 2020 13:53:12 -0700 From: Andrew Morton To: Nitin Gupta Cc: Vlastimil Babka , Khalid Aziz , Oleksandr Natalenko , Michal Hocko , Mel Gorman , Matthew Wilcox , Mike Kravetz , Joonsoo Kim , David Rientjes , Nitin Gupta , linux-kernel , linux-mm , Linux API Subject: Re: [PATCH v8] mm: Proactive compaction Message-Id: <20200617135312.4f395479454c55a8d021b023@linux-foundation.org> In-Reply-To: <20200616204527.19185-1-nigupta@nvidia.com> References: <20200616204527.19185-1-nigupta@nvidia.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.32; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 16 Jun 2020 13:45:27 -0700 Nitin Gupta wrote: > For some applications, we need to allocate almost all memory as > hugepages. However, on a running system, higher-order allocations can > fail if the memory is fragmented. Linux kernel currently does on-demand > compaction as we request more hugepages, but this style of compaction > incurs very high latency. Experiments with one-time full memory > compaction (followed by hugepage allocations) show that kernel is able > to restore a highly fragmented memory state to a fairly compacted memory > state within <1 sec for a 32G system. Such data suggests that a more > proactive compaction can help us allocate a large fraction of memory as > hugepages keeping allocation latencies low. > > ... > All looks straightforward to me and easy to disable if it goes wrong. All the hard-coded magic numbers are a worry, but such is life. One teeny complaint: > > ... > > @@ -2650,12 +2801,34 @@ static int kcompactd(void *p) > unsigned long pflags; > > trace_mm_compaction_kcompactd_sleep(pgdat->node_id); > - wait_event_freezable(pgdat->kcompactd_wait, > - kcompactd_work_requested(pgdat)); > + if (wait_event_freezable_timeout(pgdat->kcompactd_wait, > + kcompactd_work_requested(pgdat), > + msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) { > + > + psi_memstall_enter(&pflags); > + kcompactd_do_work(pgdat); > + psi_memstall_leave(&pflags); > + continue; > + } > > - psi_memstall_enter(&pflags); > - kcompactd_do_work(pgdat); > - psi_memstall_leave(&pflags); > + /* kcompactd wait timeout */ > + if (should_proactive_compact_node(pgdat)) { > + unsigned int prev_score, score; Everywhere else, scores have type `int'. Here they are unsigned. How come? Would it be better to make these unsigned throughout? I don't think a score can ever be negative? > + if (proactive_defer) { > + proactive_defer--; > + continue; > + } > + prev_score = fragmentation_score_node(pgdat); > + proactive_compact_node(pgdat); > + score = fragmentation_score_node(pgdat); > + /* > + * Defer proactive compaction if the fragmentation > + * score did not go down i.e. no progress made. > + */ > + proactive_defer = score < prev_score ? > + 0 : 1 << COMPACT_MAX_DEFER_SHIFT; > + } > }