From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7C34C433EF for ; Thu, 9 Sep 2021 13:45:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A73361211 for ; Thu, 9 Sep 2021 13:45:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3A73361211 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 430FE6B006C; Thu, 9 Sep 2021 09:45:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3E01D6B0072; Thu, 9 Sep 2021 09:45:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A854900002; Thu, 9 Sep 2021 09:45:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0118.hostedemail.com [216.40.44.118]) by kanga.kvack.org (Postfix) with ESMTP id 1A4176B006C for ; Thu, 9 Sep 2021 09:45:48 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C479E8249980 for ; Thu, 9 Sep 2021 13:45:47 +0000 (UTC) X-FDA: 78568157934.39.9A99459 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf17.hostedemail.com (Postfix) with ESMTP id 429B1F00038C for ; Thu, 9 Sep 2021 13:45:47 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 014CD1FDF7; Thu, 9 Sep 2021 13:45:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1631195146; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T+jnSJSB7cw5py1CmeNp0InXf6+WsFW/uFZAHFNDTP4=; b=fP0dLsJuGDNcEzP4drqsw8c9o4Rgp5/IqI8QFX3MGWT22HOagKmpsbvSpIavYWq+7shi4X 1syGniU58QGJwAfdlZSP218/irNPuOF42N+sWwiPUixBuMFEUhaphCx50AtdXNmvWKiTKw mhZRjzw1QGIwnxHF9VOPRInP5jRYwmY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1631195146; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T+jnSJSB7cw5py1CmeNp0InXf6+WsFW/uFZAHFNDTP4=; b=sQ0Ap6N2+Vs9qZ5VNnMkX8/V5U5Ke8L42CMj79snG6r77QVubVWn7hmXrfYm3cqRLtqnKH TYGFLVjrLD+aAXBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id DAE9813B36; Thu, 9 Sep 2021 13:45:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id tkWDNAkQOmFCIQAAMHmgww (envelope-from ); Thu, 09 Sep 2021 13:45:45 +0000 Message-ID: <71f855ac-ff61-1eed-454f-909c0e4210b2@suse.cz> Date: Thu, 9 Sep 2021 15:45:45 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.0.3 Content-Language: en-US To: Michal Hocko , Mike Kravetz Cc: Hillf Danton , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20210816162749.22b921a61156a091f3e1d14d@linux-foundation.org> <20210816184611.07b97f4c26b83090f5d48fab@linux-foundation.org> <10d86c18-f0cf-395f-4209-17ac71b9fc03@oracle.com> <2d826470-d345-0196-1359-b79ed08dfc66@oracle.com> <02a1a50f-4e7c-4eb7-519c-35b26ec2c6af@oracle.com> <20210907085001.3773-1-hdanton@sina.com> <6c42bed7-d4dd-e5eb-5a74-24cf64bf52d3@oracle.com> From: Vlastimil Babka Subject: Re: [PATCH RESEND 0/8] hugetlb: add demote/split page functionality In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=fP0dLsJu; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=sQ0Ap6N2; spf=pass (imf17.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none X-Stat-Signature: s9k5hn337a1o71k5xey5iqonz5btn6qh X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 429B1F00038C X-HE-Tag: 1631195147-320223 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/9/21 13:54, Michal Hocko wrote: > On Wed 08-09-21 14:00:19, Mike Kravetz wrote: >> On 9/7/21 1:50 AM, Hillf Danton wrote: >> > On Mon, 6 Sep 2021 16:40:28 +0200 Vlastimil Babka wrote: >> > >> > And/or clamp reclaim retries for costly orders >> > >> > reclaim retries = MAX_RECLAIM_RETRIES - order; >> > >> > to pull down the chance for stall as low as possible. >> >> Thanks, and sorry for not replying quickly. I only get back to this as >> time allows. >> >> We could clamp the number of compaction and reclaim retries in >> __alloc_pages_slowpath as suggested. However, I noticed that a single >> reclaim call could take a bunch of time. As a result, I instrumented >> shrink_node to see what might be happening. Here is some information >> from a long stall. Note that I only dump stats when jiffies > 100000. >> >> [ 8136.874706] shrink_node: 507654 total jiffies, 3557110 tries >> [ 8136.881130] 130596341 reclaimed, 32 nr_to_reclaim >> [ 8136.887643] compaction_suitable results: >> [ 8136.893276] idx COMPACT_SKIPPED, 3557109 > > Can you get a more detailed break down of where the time is spent. Also > How come the number of reclaimed pages is so excessive comparing to the > reclaim target? There is something fishy going on here. I would say it's simply should_continue_reclaim() behaving similarly to should_compact_retry(). We'll get compaction_suitable() returning COMPACT_SKIPPED because the reclaimed pages have been immediately stolen, and compaction indicates there's not enough base pages to begin with to form a high-order pages. Since the stolen pages will appear on inactive lru, it seems to be worth continuing reclaim to make enough free base pages for compaction to no longer be skipped, because "inactive_lru_pages > pages_for_compaction" is true. So, both should_continue_reclaim() and should_compact_retry() are unable to recognize that reclaimed pages are being stolen and limit the retries in that case. The scenario seems to be uncommon, otherwise we'd be getting more reports of that.